Scott Baker's Work | ContraWork by Scott Baker
Scott Baker

Scott Baker

Data Engineer DuckDB SQL

New to Contra

Scott is ready for their next project!

Cover image for  Full client engagement pipeline demonstrated
 Full client engagement pipeline demonstrated on fictional Series A SaaS data: 90,950 CRM rows, 1,025,447 Stripe  transactions, 115,500 QuickBooks invoices, 200 enterprise contracts — the exact profile of a typical new            engagement.                                                                                                                                                                      Five stages on one machine. No cluster. No cloud database fees.                                                         Stage 1: Ingest dirty data from CRM, Stripe, and QuickBooks exports.                                                Stage 2: Audit — quality scores, grime map, null analysis.  Stage 3: Clean — DuckDB SQL transforms, Parquet out.                                                                Stage 4: Analyse — 5 business queries, anomaly detection.                                                           Stage 5: RAG — embed, index, query across documents.                                                                                                                                                                                    Every stage runs inside DuckDB on a single workstation. This is the pipeline your data goes through on every        engagement.        
0
23
Cover image for Interactive Streamlit dashboard backed by
Interactive Streamlit dashboard backed by DuckDB — 167,858,646 NYC Yellow Cab records across 4 years (2022–2025),   48 Parquet files, queried live on a single workstation.                                                                                                                         Features: KPI cards (total trips, fare revenue, YoY change), monthly trip volume line chart by year, payment type   shift analysis 2022→2025, interactive year/month filters. All queries run in DuckDB — no database server, no  cluster, no Spark.                                                                                                                                                               This is the same data stack delivered to clients. Raw Parquet files in, live dashboard out — pipeline built once,   runs automatically.                                                                                                                      Built with: DuckDB 1.4 · Python · Streamlit · Plotly · Pandas    
0
21
Cover image for GitHub - NixOSDude/hazynet_scala_spark: Spark development with …
GitHub - NixOSDude/hazynet_scala_spark: Spark development with …
0
0
Cover image for Migrate off Databricks in 4
Migrate off Databricks in 4 stages. 172M rows/sec. 80% lower cost. I replace overpriced Databricks and Snowflake stacks with DuckDB — faster, cheaper, and completely under your control. Audit → Migrate → Automate → Secure. Fixed-fee engagements. Results in days.
0
43