Config-driven pipeline that harvests business records from any JSON-based directory API — no code changes needed to point at a new source. Dot-path JSON navigation, geographic bounding-box filtering, two-pass deduplication. Best-documented codebase in the toolkit. 79 tests.
0
9
Scrapes Trustpilot for business listings in any category — name, contact details, website, rating, and review count. Pre-filters for established companies only. Anti-bot evasion built in. 121 tests, CI on Ubuntu and Windows.
0
10
Searches Bing, DuckDuckGo, Mojeek, and Yahoo simultaneously and scores every result as HOT/WARM/COLD/NOISE based on configurable keyword matching. Abstract base class architecture, domain deduplication, two-pass enrichment. 72 tests.
0
11
Takes any spreadsheet of company websites and appends verified emails and phone numbers to every row. Two-pass architecture: fast HTTP first, Playwright fallback for JS-rendered pages. E.164 phone normalisation, Cloudflare bypass. 78 tests across Ubuntu, Windows, and macOS.
0
12
Playwright-driven scraper that extracts business listings from Google Maps with concurrent email enrichment — crawls each company's website for verified contact details. Cloudflare bypass, atomic checkpoint/resume, 122 tests. Outputs CRM-ready 3-sheet Excel.