Football Striker Performance Analysis

Chirag

Chirag Suri

โšฝ Football Striker Performance Analysis

In this project, I dove deep into understanding what separates an ordinary striker from a great one. Using Python, I explored a dataset of 500 strikers containing personal attributes and on-field performance stats โ€” all with the goal of answering one simple question:
๐Ÿ‘‰ What makes a football striker exceptional?

๐Ÿ› ๏ธ Tools & Technologies Used

Python (pandas, seaborn, matplotlib, scipy, sklearn, statsmodels) โ€“ For EDA, visualization, statistical testing, clustering, and machine learning.
Jupyter Notebook โ€“ For code execution and documentation.
Generative AI (ChatGPT) โ€“ Helped refine analysis steps, validate statistical logic, helped with ideation and structure the project cleanly.

๐Ÿ‘ค Author

Chirag Suri Data science enthusiast, passionate about sports analytics and turning messy data into valuable stories.
GitHub: Link
LinkedIn: Link
Portfolio: Link

๐Ÿ“ Dataset

I worked with a dataset of 500 football strikers, covering both demographic and performance-based variables.
Strikers_Performance.xlsx
๐Ÿ“Œ Note: I couldnโ€™t locate the original source, so itโ€™s being treated as synthetic/simulated data for educational purposes.

๐ŸŽฏ Project Objectives

This project was focused on segmenting and classifying strikers based on their:
โšฝ On-field performance (Goals, Assists, Dribbling, etc.)
๐Ÿง  Attributes (Footedness, Consistency, Game IQ, Conduct)
๐Ÿ“Š Team Impact & Match Influence
The broader goal was to develop a data-backed system for:
Identifying top-tier vs average strikers.
Helping scouts/coaches with player selection.
Answering questions even analysts may overlook.

๐Ÿง  Key Questions Solved

These are just a few of the many insights pulled from the dataset:
Whatโ€™s the maximum number of goals scored by a striker?
What percentage of strikers are right-footed?
Which nationality scores the most goals on average?
Whatโ€™s the average conversion rate of left-footed players?
Do hold-up play skills correlate with consistency?
Are consistency scores normally distributed?
Is there a statistical difference in performance between nationalities?
Can we predict striker types using logistic regression?

๐Ÿ”„ Project Workflow

๐Ÿงผ 1. Data Cleaning & Preparation

Handled null values using SimpleImputer:
Median for numeric
Most frequent for categorical
Typecasting of key performance metrics (e.g., Goals, Assists).
Used LabelEncoder for footedness and marital status.
Created dummy variables for nationality.

๐Ÿ“Š 2. Exploratory Data Analysis

Descriptive stats for all key metrics.
Pie chart for footedness distribution.
Countplot of footedness by nationality.

๐Ÿ“ˆ 3. Statistical Analysis

Used groupby + mean to analyze national scoring rates.
Performed Shapiro-Wilk for normality check.
Leveneโ€™s Test to validate homogeneity before ANOVA.
Correlation (Pearson) and regression to understand how "Hold-up Play" influences consistency.

๐Ÿงช 4. Feature Engineering

Created a Total Contribution Score from key fields:
Goals, Assists, Shots on Target, Dribbles, etc.
Used this score for clustering and ML input.

๐Ÿง  5. K-Means Clustering

Identified 2 clusters using elbow method.
Tagged strikers as:
Best Strikers
Regular Strikers

๐Ÿค– 6. Machine Learning (Logistic Regression)

Trained model to classify striker type.
Used StandardScaler for normalization.
Achieved solid accuracy and visualized predictions via confusion matrix.

๐Ÿ’ก Key Insights

๐Ÿฅ‡ The highest individual goal tally: 36 goals
๐Ÿฆต Right-footed strikers dominate the dataset (about 73%).
๐Ÿ‡ง๐Ÿ‡ท Brazilian strikers had the highest average goal count.
๐Ÿง  Hold-up Play shows positive correlation (0.55) with consistency.
๐Ÿ“Š Consistency scores were not normally distributed, but heteroscedasticity was not an issue.
๐Ÿงช The regression model showed hold-up play significantly predicts consistency.
๐Ÿงฎ Best strikers had an average contribution score of ~212.
โœ… The Logistic Regression model achieved X% accuracy (insert from results).

๐Ÿ“š Things I Learned

This project helped me improve in:
โœ… End-to-end structuring of ML-based projects.
๐Ÿ“Š Choosing the right test depending on data assumptions.
๐Ÿงน Proper preprocessing: imputation, encoding, scaling.
๐Ÿ’ฌ Explaining results with storytelling, not just numbers.
๐Ÿค Using generative AI to validate and speed up analysis.

๐Ÿš€ How to Explore

If you're looking to test the code:
Clone/download this repo.
Open Football-Striker.ipynb in Jupyter.
Run through sections, tweak assumptions, try different models.

โœ… What's Next?

Build an interactive Power BI dashboard based on this project.

THANK YOU! ๐Ÿ™Œ

Like this project

Posted Jul 27, 2025

Analyzed football strikers' performance using Python and ML to identify exceptional players.