End-to-End Data Pipeline for Genetic Research

Harshit Maheshwari

Data Scientist
Product Manager
Data Analyst
Confluence
GitHub
I successfully led the development of an end-to-end MLOps data pipeline for handling 5TB of genetic data as part of a research initiative focused on machine learning and predictive modeling. This project aimed to optimize data processing and enable faster, more accurate insights in the field of genetic research.
Using AWS, I designed and implemented a robust pipeline that facilitated real-time data processing, significantly reducing the manual workload by automating key tasks. Additionally, I collaborated with a team of researchers and supported them in building multiple predictive machine learning and deep learning models trained across 700+ parameters. These models played a critical role in identifying key genetic markers, improving the efficiency and accuracy of research outcomes.
This project not only streamlined the research process but also enhanced the ability of researchers to quickly generate insights from large datasets, further advancing genetic research efforts. It showcases my expertise in MLOps, data engineering, and machine learning while driving impactful results in scientific research settings.
Partner With Harshit
View Services

More Projects by Harshit