Objective: Focused on enhancing football player data's quality, granularity, and applicability for diverse modeling and analytical purposes.
Performed correlation analysis and feature engineering to identify and select each player position's top 30 influential features.
Developed a four-phase data engineering pipeline comprising initial cleaning, feature selection, multi-level imputation, and feature scaling based on score correlations.
Implemented biannual data segmentation for integrating player data from leagues with different schedules, deploying the pipeline on a GCP server with results stored in BigQuery, and reprocessing three large data sources in under one day simultaneously.
Outcome: Successfully enhanced player data quality, significantly improving model quality by approximately 75% and streamlining the generation of time series features.
Like this project
0
Posted Jan 14, 2024
Improved football player data quality, boosted performance by 75% with efficient data engineering pipeline.