Automated Data Cleanup and Migration Platform by Abhishek JhaAutomated Data Cleanup and Migration Platform by Abhishek Jha

Automated Data Cleanup and Migration Platform

Abhishek Jha

Abhishek Jha

The Problem

A healthcare organization was running on 3 legacy database systems, some over 15 years old. Patient records, billing data, and clinical notes were scattered across systems that couldn't talk to each other. Data quality was terrible: duplicate records, inconsistent formats, missing fields, and orphaned references everywhere. They needed to consolidate everything into a modern platform without losing a single record.

What I Built

I built an automated data cleanup and migration platform that handled the entire process: profiling source data quality, building cleanup rules, deduplicating records, standardizing formats, and migrating everything into Snowflake.
The system used Python scripts for data profiling and rule generation, dbt for transformation and validation, and Airflow for orchestrating the multi-stage migration. Every record was tracked through the pipeline with full audit trails.
I built automated validation checks that compared source and target record counts, checksums, and business rule compliance at every stage.

Key Results

Migrated 4.2 million records across 3 legacy systems with zero data loss
Duplicate records reduced by 89%
Data quality score improved from 62% to 99.1%
Migration completed in 3 weeks instead of the estimated 3 months

Tools Used

Snowflake, dbt, Apache Airflow, Python, SQL

My Role

Lead data engineer. Owned the migration strategy, built all cleanup automation, and managed the validation process. Delivered in 3 weeks.
Like this project

Posted Apr 13, 2026

Automated the cleanup and migration of millions of records across legacy systems. Messy data to clean, analytics-ready datasets.