FLOWFINDER: Watershed Delineation Research & Benchmark Framework by Richard DonohueFLOWFINDER: Watershed Delineation Research & Benchmark Framework by Richard Donohue

FLOWFINDER: Watershed Delineation Research & Benchmark Framework

Richard Donohue

Completed work

Researcher

Docker

Marketing

FLOWFINDER: Watershed Delineation Research & Benchmark Framework

A research project exploring watershed delineation accuracy and developing systematic comparison methods for hydrological analysis tools. This work addresses key challenges in watershed delineation: reliability validation, systematic benchmarking, and geographic specialization for complex terrain.

🎯 Research Questions

What problems are we trying to solve?

Reliability Gap: How can we systematically validate watershed delineation tools across diverse terrain types?

Benchmarking Gap: Why is there no standardized framework for comparing watershed delineation tools?

Geographic Bias: How do existing tools perform in Mountain West terrain compared to other regions?

Reproducibility Crisis: How can we ensure watershed delineation results are reproducible and comparable?

Our approach: Develop FLOWFINDER as both a research tool and benchmark framework to systematically investigate these questions.

🔬 Research Context

Current State of Watershed Delineation

Tool proliferation: Multiple tools (TauDEM, GRASS, WhiteboxTools) with different algorithms

Validation challenges: Limited systematic comparison of accuracy and performance

Geographic bias: Most studies focus on eastern US or international basins

Reproducibility issues: Ad-hoc validation methods make results hard to compare

Research Gaps We're Addressing

Systematic benchmarking: No standardized framework for multi-tool comparison

Mountain West terrain: Limited research on complex terrain performance

Reliability metrics: Need for consistent validation across tools

Open methodology: Reproducible research practices for watershed analysis

📋 Prerequisites

Python 3.8+

FLOWFINDER CLI tool installed and accessible

Access to USGS NHD+ HR data and 3DEP 10m DEM data

8GB+ RAM recommended for processing large datasets

Docker (for TauDEM integration)

GRASS GIS (for r.watershed comparison)

🚀 Quick Start

1. Installation

# Clone the repository
git clone <repository-url>
cd flowfinder

# Install dependencies
pip install -e .[dev]

# Install FLOWFINDER
pip install flowfinder

# Copy environment template (if it exists)
cp .env.example .env || echo "Create .env file with your data paths"
# Edit .env with your data paths and configuration

2. Configuration Setup

The system uses a hierarchical configuration architecture to manage complexity:

# Configuration structure is already set up:

# Environment-specific configurations
config/environments/development.yaml    # Local dev (10 basins)
config/environments/testing.yaml        # CI/testing (50 basins)
config/environments/production.yaml     # Full-scale (500+ basins)

# Tool-specific configurations
config/tools/flowfinder.yaml            # FLOWFINDER settings
config/tools/taudem.yaml               # TauDEM MPI settings
config/tools/grass.yaml                # GRASS r.watershed settings
config/tools/whitebox.yaml             # WhiteboxTools settings

3. Data Preparation

Place your input datasets in the data/ directory:

data/
├── huc12_mountain_west.shp    # HUC12 boundaries for Mountain West
├── nhd_hr_catchments.shp      # NHD+ HR catchment polygons
├── nhd_flowlines.shp          # NHD+ HR flowlines
└── dem_10m.tif               # 10m DEM mosaic or tiles

4. Run Single-Tool Benchmark

# Step 1: Generate stratified basin sample
python scripts/basin_sampler.py --config config/basin_sampler_config.yaml

# Step 2: Extract truth polygons
python scripts/truth_extractor.py --config config/truth_extractor_config.yaml

# Step 3: Run FLOWFINDER benchmark
python scripts/benchmark_runner.py \
    --sample basin_sample.csv \
    --truth truth_polygons.gpkg \
    --config config/benchmark_config.yaml \
    --outdir results/

5. Run Multi-Tool Comparison (Experimental)

# Using the watershed experiment runner
python scripts/watershed_experiment_runner.py \
    --single --lat 40.0 --lon -105.5 --name "test_run" \
    --outdir results/multi_tool/

📁 Project Structure

├── README.md                    # Project overview + setup
├── requirements.txt             # Python dependencies
├── pyproject.toml              # Modern Python project config
├── .env.example                # Environment template
├── .gitignore                  # Standard Python gitignore
│
├── config/                     # Hierarchical configuration system
│   ├── base.yaml              # Foundation configurations
│   ├── configuration_manager.py # Configuration inheritance system
│   ├── schema.json            # JSON Schema validation
│   ├── environments/           # Environment-specific settings
│   │   ├── development.yaml   # Local development (10 basins)
│   │   ├── testing.yaml       # CI/testing (50 basins)
│   │   └── production.yaml    # Full-scale (500+ basins)
│   └── tools/                  # Tool-specific configurations
│       ├── flowfinder.yaml    # FLOWFINDER settings
│       ├── taudem.yaml        # TauDEM MPI settings
│       ├── grass.yaml         # GRASS r.watershed settings
│       └── whitebox.yaml      # WhiteboxTools settings
│
├── scripts/                    # Core benchmark scripts
│   ├── basin_sampler.py       # Stratified basin sampling
│   ├── truth_extractor.py     # Truth polygon extraction
│   ├── benchmark_runner.py    # FLOWFINDER accuracy testing
│   ├── watershed_experiment_runner.py # Multi-tool comparison
│   └── validation_tools.py    # Validation utilities
│
├── data/                       # Input datasets (gitignored)
├── results/                    # Output directory (gitignored)
├── tests/                      # Unit tests
├── docs/                       # Research and technical documentation
│   ├── strategic_analysis_implementation_roadmap_v2.md # Research roadmap
│   ├── multi_tool_integration_strategy.md # Integration approach
│   ├── strategic_analysis_assessment.md # Research evaluation
│   ├── immediate_next_steps.md # Implementation priorities
│   ├── configuration_architecture.md # Configuration system design
│   ├── multi_tool_benchmark_architecture.md # Framework design
│   └── test_coverage/          # Test coverage documentation
│
└── notebooks/                  # Jupyter exploration
    └── benchmark_analysis.ipynb

🔧 Configuration Architecture

The system uses a hierarchical configuration architecture to manage complexity across different tools and environments:

Configuration Hierarchy

Base Configurations → Environment → Tool → Local Overrides

Example Configuration Composition

# Development FLOWFINDER experiment
inherits:
  - "base/regions.yaml#mountain_west_minimal"
  - "base/quality_standards.yaml#development_grade"
  - "environments/development.yaml"
  - "tools/flowfinder/base.yaml"
  - "experiments/accuracy_comparison.yaml"

overrides:
  basin_sampling:
    n_per_stratum: 1  # Minimal for dev
  benchmark:
    timeout_seconds: 30  # Quick timeout

Tool Adapter Interface

class ToolAdapter(ABC):
    @abstractmethod
    def delineate_watershed(self, pour_point: Point, dem_path: str) -> Tuple[Polygon, Dict]:
        """Delineate watershed and return polygon + performance metrics"""
        pass

    @abstractmethod
    def is_available(self) -> bool:
        """Check if tool is available on system"""
        pass

📊 Research Outputs

Single-Tool Benchmark

benchmark_results.json: Detailed per-basin metrics

accuracy_summary.csv: Tabular results for analysis

benchmark_summary.txt: Performance analysis and key findings

Multi-Tool Comparison (Experimental)

multi_tool_results.json: Comparative analysis across tools

performance_comparison.csv: Runtime and memory comparisons

statistical_analysis.csv: ANOVA, Tukey HSD, Kruskal-Wallis results

publication_figures/: Research-ready charts and graphs

🎯 Research Metrics

Technical Validation

Metric Current Target Status FLOWFINDER IOU (mean) ≥ 0.90 🔄 In Progress FLOWFINDER IOU (90th percentile) ≥ 0.95 🔄 In Progress Runtime (mean) ≤ 30 s 🔄 In Progress Configuration redundancy 90% reduction ✅ Achieved Tool integration success 4 major tools integrated 🔄 In Progress

Research Impact Goals

Metric Target Status Peer-reviewed publications 2+ papers submitted 🔄 In Progress Conference presentations 5+ presentations 🔄 In Progress Citations (2 years) 100+ citations 🔄 In Progress Framework adoption 3+ external research groups 🔄 In Progress

Community Engagement Goals

Metric Target Status GitHub stars 500+ stars 🔄 In Progress FLOWFINDER downloads 1000+ downloads 🔄 In Progress External contributors 10+ contributors 🔄 In Progress Institutional adoptions 5+ adoptions 🔄 In Progress

🧪 Testing

# Run unit tests
python -m pytest tests/

# Test configuration system
python test_configuration_system.py

# Test multi-tool integration
python test_integration.py

# Run with coverage
python -m pytest tests/ --cov=scripts --cov-report=html

📈 Analysis

Use the Jupyter notebook for detailed analysis:

# Start Jupyter
jupyter lab notebooks/

# Open benchmark_analysis.ipynb for interactive exploration

🎯 Research Roadmap

Phase 1: Foundation - IN PROGRESS

✅ Configuration Architecture: Hierarchical system implemented

✅ FLOWFINDER Development: Core tool with validation framework

🔄 Benchmark Framework MVP: Multi-tool comparison development

🔄 Literature Review: Research gap analysis and methodology development

Phase 2: Tool Integration - PLANNED

🔄 WhiteboxTools Integration: Rust-based performance comparison

🔄 TauDEM Integration: Academic gold standard validation

🔄 GRASS GIS Integration: Comprehensive hydrological suite

🔄 SAGA GIS Integration: European academic adoption

📚 Documentation

Research Documents

Research Roadmap: Implementation plan with research milestones

Multi-Tool Integration Strategy: Research-based tool integration approach

Research Assessment: Comprehensive research evaluation

Next Steps: Implementation priorities

Technical Documents

Configuration Architecture: Hierarchical configuration system design

Multi-Tool Benchmark Architecture: Framework design and implementation

Test Coverage: Comprehensive testing documentation

🤝 Contributing

We welcome contributions from the research community:

Fork the repository

Create a feature branch (git checkout -b feature/research-improvement)

Commit your changes (git commit -m 'Add research improvement')

Push to the branch (git push origin feature/research-improvement)

Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

USGS for NHD+ HR and 3DEP data

FLOWFINDER development team

Open source geospatial community

Academic research community for feedback and validation

📞 Support

For research questions and technical issues:

Check the documentation

Review the Research Roadmap

See the Multi-Tool Integration Strategy

Open an issue on GitHub

"Research is formalized curiosity. It is poking and prying with a purpose."

FLOWFINDER: Exploring watershed delineation accuracy and developing systematic comparison methods for hydrological research.

Like this project

Completed work

Posted Jul 21, 2025

Developed FLOWFINDER for watershed delineation research and benchmarking.

Likes

Views