Scott Baker's Work | ContraWork by Scott Baker

Scott Baker

Data Engineer DuckDB SQL

New to Contra

Scott is ready for their next project!

Phoenix, USA

Work Posts Products Services About

Phoenix, USA

Cover image for Full client engagement pipeline demonstrated

Full client engagement pipeline demonstrated on fictional Series A SaaS data: 90,950 CRM rows, 1,025,447 Stripe transactions, 115,500 QuickBooks invoices, 200 enterprise contracts — the exact profile of a typical new engagement. Five stages on one machine. No cluster. No cloud database fees. Stage 1: Ingest dirty data from CRM, Stripe, and QuickBooks exports. Stage 2: Audit — quality scores, grime map, null analysis. Stage 3: Clean — DuckDB SQL transforms, Parquet out. Stage 4: Analyse — 5 business queries, anomaly detection. Stage 5: RAG — embed, index, query across documents. Every stage runs inside DuckDB on a single workstation. This is the pipeline your data goes through on every engagement.

Cover image for Interactive Streamlit dashboard backed by

Interactive Streamlit dashboard backed by DuckDB — 167,858,646 NYC Yellow Cab records across 4 years (2022–2025), 48 Parquet files, queried live on a single workstation. Features: KPI cards (total trips, fare revenue, YoY change), monthly trip volume line chart by year, payment type shift analysis 2022→2025, interactive year/month filters. All queries run in DuckDB — no database server, no cluster, no Spark. This is the same data stack delivered to clients. Raw Parquet files in, live dashboard out — pipeline built once, runs automatically. Built with: DuckDB 1.4 · Python · Streamlit · Plotly · Pandas

GitHub - NixOSDude/hazynet_scala_spark: Spark development with …

Cover image for Migrate off Databricks in 4

Migrate off Databricks in 4 stages. 172M rows/sec. 80% lower cost. I replace overpriced Databricks and Snowflake stacks with DuckDB — faster, cheaper, and completely under your control. Audit → Migrate → Automate → Secure. Fixed-fee engagements. Results in days.