This nationwide study investigated how sociodemographic and environmental factors impacted COVID-19 infections and fatalities across large U.S. counties during the pre-vaccination phase (JanβDec 2020).
It revealed that the effect of risk factors like education, pollution, and poverty changed over time, and were often only significant in high-infection zones, highlighting the importance of dynamic, region-based health strategies.
π§ͺ My Technical Contributions
β Time-Series Forecasting with ARIMA
Built ARIMA models to predict pollution levels (PM2.5, NOβ, SOβ, Oβ) over time
Generated 95% confidence intervals to visualize uncertainty across different regions
Created zone-specific forecasts aligned with the COVID-19 wave timeline (Phases 1 & 2)
β Data Cleaning & Structuring
Processed and merged large-scale public datasets (EPA air quality, population, COVID-19 rates)
Structured time-series data by pollutant and county; ensured temporal alignment
Resolved missing or inconsistent values across thousands of time points
β Scientific Visualization
Created time-series plots with confidence bands for inclusion in published figures
Visualized regional trends across infection zones for insight into environmental shifts
π§ Techniques Used
π Key Insights from the Study
Phase 1: Population density, poverty, and education were stronger predictors
Phase 2: Those same variables lost significance, while age and Oβ exposure became more important
NOβ correlated with fatality only in high-infection zones
Forecasting helped visualize pollution recovery and spikes in high-activity areas
β Relevance to Freelance / Consulting Work
This project proves my ability to:
Build forecasting models using Python and real-world environmental data
Segment and compare trends across geographies and time periods
Visualize uncertainty for decision support or policy design
Handle large, messy public datasets with scientific rigor