FLOWFINDER: Watershed Delineation Research & Benchmark Framework

Richard

Richard Donohue

FLOWFINDER: Watershed Delineation Research & Benchmark Framework

A research project exploring watershed delineation accuracy and developing systematic comparison methods for hydrological analysis tools. This work addresses key challenges in watershed delineation: reliability validation, systematic benchmarking, and geographic specialization for complex terrain.

๐ŸŽฏ Research Questions

What problems are we trying to solve?
Reliability Gap: How can we systematically validate watershed delineation tools across diverse terrain types?
Benchmarking Gap: Why is there no standardized framework for comparing watershed delineation tools?
Geographic Bias: How do existing tools perform in Mountain West terrain compared to other regions?
Reproducibility Crisis: How can we ensure watershed delineation results are reproducible and comparable?
Our approach: Develop FLOWFINDER as both a research tool and benchmark framework to systematically investigate these questions.

๐Ÿ”ฌ Research Context

Current State of Watershed Delineation

Tool proliferation: Multiple tools (TauDEM, GRASS, WhiteboxTools) with different algorithms
Validation challenges: Limited systematic comparison of accuracy and performance
Geographic bias: Most studies focus on eastern US or international basins
Reproducibility issues: Ad-hoc validation methods make results hard to compare

Research Gaps We're Addressing

Systematic benchmarking: No standardized framework for multi-tool comparison
Mountain West terrain: Limited research on complex terrain performance
Reliability metrics: Need for consistent validation across tools
Open methodology: Reproducible research practices for watershed analysis

๐Ÿ“‹ Prerequisites

Python 3.8+
FLOWFINDER CLI tool installed and accessible
Access to USGS NHD+ HR data and 3DEP 10m DEM data
8GB+ RAM recommended for processing large datasets
Docker (for TauDEM integration)
GRASS GIS (for r.watershed comparison)

๐Ÿš€ Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd flowfinder

# Install dependencies
pip install -e .[dev]

# Install FLOWFINDER
pip install flowfinder

# Copy environment template (if it exists)
cp .env.example .env || echo "Create .env file with your data paths"
# Edit .env with your data paths and configuration

2. Configuration Setup

The system uses a hierarchical configuration architecture to manage complexity:
# Configuration structure is already set up:

# Environment-specific configurations
config/environments/development.yaml # Local dev (10 basins)
config/environments/testing.yaml # CI/testing (50 basins)
config/environments/production.yaml # Full-scale (500+ basins)

# Tool-specific configurations
config/tools/flowfinder.yaml # FLOWFINDER settings
config/tools/taudem.yaml # TauDEM MPI settings
config/tools/grass.yaml # GRASS r.watershed settings
config/tools/whitebox.yaml # WhiteboxTools settings

3. Data Preparation

Place your input datasets in the data/ directory:
data/
โ”œโ”€โ”€ huc12_mountain_west.shp # HUC12 boundaries for Mountain West
โ”œโ”€โ”€ nhd_hr_catchments.shp # NHD+ HR catchment polygons
โ”œโ”€โ”€ nhd_flowlines.shp # NHD+ HR flowlines
โ””โ”€โ”€ dem_10m.tif # 10m DEM mosaic or tiles

4. Run Single-Tool Benchmark

# Step 1: Generate stratified basin sample
python scripts/basin_sampler.py --config config/basin_sampler_config.yaml

# Step 2: Extract truth polygons
python scripts/truth_extractor.py --config config/truth_extractor_config.yaml

# Step 3: Run FLOWFINDER benchmark
python scripts/benchmark_runner.py \
--sample basin_sample.csv \
--truth truth_polygons.gpkg \
--config config/benchmark_config.yaml \
--outdir results/

5. Run Multi-Tool Comparison (Experimental)

# Using the watershed experiment runner
python scripts/watershed_experiment_runner.py \
--single --lat 40.0 --lon -105.5 --name "test_run" \
--outdir results/multi_tool/

๐Ÿ“ Project Structure

โ”œโ”€โ”€ README.md                    # Project overview + setup
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ pyproject.toml # Modern Python project config
โ”œโ”€โ”€ .env.example # Environment template
โ”œโ”€โ”€ .gitignore # Standard Python gitignore
โ”‚
โ”œโ”€โ”€ config/ # Hierarchical configuration system
โ”‚ โ”œโ”€โ”€ base.yaml # Foundation configurations
โ”‚ โ”œโ”€โ”€ configuration_manager.py # Configuration inheritance system
โ”‚ โ”œโ”€โ”€ schema.json # JSON Schema validation
โ”‚ โ”œโ”€โ”€ environments/ # Environment-specific settings
โ”‚ โ”‚ โ”œโ”€โ”€ development.yaml # Local development (10 basins)
โ”‚ โ”‚ โ”œโ”€โ”€ testing.yaml # CI/testing (50 basins)
โ”‚ โ”‚ โ””โ”€โ”€ production.yaml # Full-scale (500+ basins)
โ”‚ โ””โ”€โ”€ tools/ # Tool-specific configurations
โ”‚ โ”œโ”€โ”€ flowfinder.yaml # FLOWFINDER settings
โ”‚ โ”œโ”€โ”€ taudem.yaml # TauDEM MPI settings
โ”‚ โ”œโ”€โ”€ grass.yaml # GRASS r.watershed settings
โ”‚ โ””โ”€โ”€ whitebox.yaml # WhiteboxTools settings
โ”‚
โ”œโ”€โ”€ scripts/ # Core benchmark scripts
โ”‚ โ”œโ”€โ”€ basin_sampler.py # Stratified basin sampling
โ”‚ โ”œโ”€โ”€ truth_extractor.py # Truth polygon extraction
โ”‚ โ”œโ”€โ”€ benchmark_runner.py # FLOWFINDER accuracy testing
โ”‚ โ”œโ”€โ”€ watershed_experiment_runner.py # Multi-tool comparison
โ”‚ โ””โ”€โ”€ validation_tools.py # Validation utilities
โ”‚
โ”œโ”€โ”€ data/ # Input datasets (gitignored)
โ”œโ”€โ”€ results/ # Output directory (gitignored)
โ”œโ”€โ”€ tests/ # Unit tests
โ”œโ”€โ”€ docs/ # Research and technical documentation
โ”‚ โ”œโ”€โ”€ strategic_analysis_implementation_roadmap_v2.md # Research roadmap
โ”‚ โ”œโ”€โ”€ multi_tool_integration_strategy.md # Integration approach
โ”‚ โ”œโ”€โ”€ strategic_analysis_assessment.md # Research evaluation
โ”‚ โ”œโ”€โ”€ immediate_next_steps.md # Implementation priorities
โ”‚ โ”œโ”€โ”€ configuration_architecture.md # Configuration system design
โ”‚ โ”œโ”€โ”€ multi_tool_benchmark_architecture.md # Framework design
โ”‚ โ””โ”€โ”€ test_coverage/ # Test coverage documentation
โ”‚
โ””โ”€โ”€ notebooks/ # Jupyter exploration
โ””โ”€โ”€ benchmark_analysis.ipynb

๐Ÿ”ง Configuration Architecture

The system uses a hierarchical configuration architecture to manage complexity across different tools and environments:

Configuration Hierarchy

Base Configurations โ†’ Environment โ†’ Tool โ†’ Local Overrides

Example Configuration Composition

# Development FLOWFINDER experiment
inherits:
- "base/regions.yaml#mountain_west_minimal"
- "base/quality_standards.yaml#development_grade"
- "environments/development.yaml"
- "tools/flowfinder/base.yaml"
- "experiments/accuracy_comparison.yaml"

overrides:
basin_sampling:
n_per_stratum: 1 # Minimal for dev
benchmark:
timeout_seconds: 30 # Quick timeout

Tool Adapter Interface

class ToolAdapter(ABC):
@abstractmethod
def delineate_watershed(self, pour_point: Point, dem_path: str) -> Tuple[Polygon, Dict]:
"""Delineate watershed and return polygon + performance metrics"""
pass

@abstractmethod
def is_available(self) -> bool:
"""Check if tool is available on system"""
pass

๐Ÿ“Š Research Outputs

Single-Tool Benchmark

benchmark_results.json: Detailed per-basin metrics
accuracy_summary.csv: Tabular results for analysis
benchmark_summary.txt: Performance analysis and key findings

Multi-Tool Comparison (Experimental)

multi_tool_results.json: Comparative analysis across tools
performance_comparison.csv: Runtime and memory comparisons
statistical_analysis.csv: ANOVA, Tukey HSD, Kruskal-Wallis results
publication_figures/: Research-ready charts and graphs

๐ŸŽฏ Research Metrics

Technical Validation

Metric Current Target Status FLOWFINDER IOU (mean) โ‰ฅ 0.90 ๐Ÿ”„ In Progress FLOWFINDER IOU (90th percentile) โ‰ฅ 0.95 ๐Ÿ”„ In Progress Runtime (mean) โ‰ค 30 s ๐Ÿ”„ In Progress Configuration redundancy 90% reduction โœ… Achieved Tool integration success 4 major tools integrated ๐Ÿ”„ In Progress

Research Impact Goals

Metric Target Status Peer-reviewed publications 2+ papers submitted ๐Ÿ”„ In Progress Conference presentations 5+ presentations ๐Ÿ”„ In Progress Citations (2 years) 100+ citations ๐Ÿ”„ In Progress Framework adoption 3+ external research groups ๐Ÿ”„ In Progress

Community Engagement Goals

Metric Target Status GitHub stars 500+ stars ๐Ÿ”„ In Progress FLOWFINDER downloads 1000+ downloads ๐Ÿ”„ In Progress External contributors 10+ contributors ๐Ÿ”„ In Progress Institutional adoptions 5+ adoptions ๐Ÿ”„ In Progress

๐Ÿงช Testing

# Run unit tests
python -m pytest tests/

# Test configuration system
python test_configuration_system.py

# Test multi-tool integration
python test_integration.py

# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=html

๐Ÿ“ˆ Analysis

Use the Jupyter notebook for detailed analysis:
# Start Jupyter
jupyter lab notebooks/

# Open benchmark_analysis.ipynb for interactive exploration

๐ŸŽฏ Research Roadmap

Phase 1: Foundation - IN PROGRESS

โœ… Configuration Architecture: Hierarchical system implemented
โœ… FLOWFINDER Development: Core tool with validation framework
๐Ÿ”„ Benchmark Framework MVP: Multi-tool comparison development
๐Ÿ”„ Literature Review: Research gap analysis and methodology development

Phase 2: Tool Integration - PLANNED

๐Ÿ”„ WhiteboxTools Integration: Rust-based performance comparison
๐Ÿ”„ TauDEM Integration: Academic gold standard validation
๐Ÿ”„ GRASS GIS Integration: Comprehensive hydrological suite
๐Ÿ”„ SAGA GIS Integration: European academic adoption

๐Ÿ“š Documentation

Research Documents

Research Roadmap: Implementation plan with research milestones
Multi-Tool Integration Strategy: Research-based tool integration approach
Research Assessment: Comprehensive research evaluation
Next Steps: Implementation priorities

Technical Documents

Configuration Architecture: Hierarchical configuration system design
Multi-Tool Benchmark Architecture: Framework design and implementation
Test Coverage: Comprehensive testing documentation

๐Ÿค Contributing

We welcome contributions from the research community:
Fork the repository
Create a feature branch (git checkout -b feature/research-improvement)
Commit your changes (git commit -m 'Add research improvement')
Push to the branch (git push origin feature/research-improvement)
Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

USGS for NHD+ HR and 3DEP data
FLOWFINDER development team
Open source geospatial community
Academic research community for feedback and validation

๐Ÿ“ž Support

For research questions and technical issues:
Check the documentation
Review the Research Roadmap
Open an issue on GitHub
"Research is formalized curiosity. It is poking and prying with a purpose."
FLOWFINDER: Exploring watershed delineation accuracy and developing systematic comparison methods for hydrological research.
Like this project

Posted Jul 21, 2025

Developed FLOWFINDER for watershed delineation research and benchmarking.