A robust and performant data ingestion and processing system was developed as a solo project. The workflow involved aggregating large quantities of data on the order of billions of data points in size from multiple sources and cleaning, standardizing, and extracting useful metrics from the data for eventual use in a web app. The system was deployed on AWS EC2 and ECS using Apache Airflow and, later, Dagster for observability and scheduling. Databricks was used for distributed analysis with PySpark.