Olympic Medal Count Visualization for Paris 2024

Lile Ati

Completed work

Data Analyst

Data Visualizer

Microsoft Power BI

Python

Analytics

Paris Olympics 2024 (Track & Field)

Project Overview

This project involves processing and analyzing athletics data from two datasets, events.csv and results.csv, to produce a summarized report of medal counts by country for final rounds of athletic events. The workflow includes cleaning and transforming the data using Python, followed by creating an interactive dashboard in Power BI to visualize the results. The report focuses on the Python activities for data cleaning, merging, and summarization, and the Power BI activities for dashboard creation and data visualization.

Files

1. Python Activities: Data Cleaning, Merging, and Summarization

Objective:

To clean, merge, and transform the athletics datasets (events.csv and results.csv) into a summarized format that captures medal counts by country for final rounds, focusing on the top 3 positions per event and gender.

Dataset Description:

events.csv: Contains event details with columns Local Time, Sex, Event, and Round.

results.csv: Contains athlete results with columns Local Time, Sex, Event, Round, Name, Country, and Position.

Key Python Activities:

1. Data Loading and Inspection:

Loaded events.csv and results.csv into pandas DataFrames.

Inspected the data using info() to understand the structure and identify issues like missing values or inconsistent data types.

2. Data Cleaning:

Handled missing values in the Position column by filling with "N/A".

Corrected typos (e.g., "Unpecified" to "Unspecified" in the Sex column).

Standardized capitalization for categorical columns (Sex, Event, Round).

Validated and fixed inconsistencies

Converted time format

3. Data Merging:

Merged the two datasets on Local Time, Sex, Event, and Round using an inner join to ensure only matching records were retained.

________________________________________

2. Power BI Activities: Dashboard Creation and Visualization

Objective: To create an interactive dashboard in Power BI using the summarized medal counts (country_medal_counts.csv) to visualize the distribution of medals by country and provide key insights.

Dataset Used:

The country_medal_counts.csv file, which contains medal counts by country with columns Country, Gold, Silver, Bronze, No Medal, and Total.

Power BI Workflow:

Data Import:

Opened Power BI Desktop.

Imported country_medal_counts.csv using Get Data > Text/CSV.

Verified that Power BI correctly recognized the data types (e.g., Gold, Silver, Bronze, Total as numbers, Country as text).

Data Preparation:

Ensured the dataset was clean and ready for visualization, as the Python script had already summarized the data.

No additional transformations were needed in Power BI, as the data was pre-processed.

3. Visualization Creation:

Created measures:

Total Countries: DISTINCTCOUNT(combined_paris_athletics_data[Country]).

Total Medals: COUNTROWS(FILTER('combined_paris_athletics_data', combined_paris_athletics_data[Medal] IN {"Gold", "Silver", "Bronze"} && 'combined_paris_athletics_data'[Round] = "Final")).

Total Gold: COUNTROWS(FILTER('combined_paris_athletics_data', combined_paris_athletics_data[Medal] = "Gold" && 'combined_paris_athletics_data'[Round] = "Final")).

Added each measure to a Card visual to display high-level metrics.

Created visuals

Pie Chart for Distribution of Medals:

Created a pie chart to show the overall distribution of medals.

Legend: Categorised as "Gold", "Silver", "Bronze".

Treemap for Events Participated by Country:

Group: Country.

This visual highlights which countries participated in the most track and field events.

Bar Chart for Medal Count per Country, by Gender:

X-Axis: Country.

Y-Axis: Gender. Men, Women(stacked).

This visual depicts gender across countries.

Bar Chart for Participant per Event:

X-Axis: Athletes.

Y-Axis: Event.

This visual shows the participants of each event.

Map Visual for Geographical Distribution:

Location: Country.

Values: Medals.

Fill colours represent the total number of medals per country.

Slicer: Added a slicer for Gender, Event, Round to allow filtering by specific parameters.

Cards: For Gold Medals, Number of Countries and Athletes.

4. Interactivity and Formatting:

Enabled cross-filtering so that selecting a country in one visual (e.g., the treemap) filters the other visuals.

Applied a consistent color scheme (e.g., gold for Gold, silver for Silver, bronze for Bronze).

Adjusted font sizes, labels, and titles for clarity.

Key Insights from Visualizations:

Medal Distribution: The pie chart shows the overall proportion of Gold, Silver, and Bronze medals, highlighting the balance of awards.

Top-Performing Countries: The treemap and bar charts identify countries with the highest medal counts, such as the USA and Kenya.

Geographical Spread: The map visual reveals the global distribution of medal-winning countries, emphasizing regions with strong athletic performance.

Detailed Breakdown: The table provides a clear, tabular view of medal counts per country, useful for precise data lookup.

Skills Demonstrated:

Proficiency in Power BI for data visualization and dashboard creation.

Ability to create meaningful visuals (cards, pie charts, treemaps, bar charts, tables, maps) to convey insights.

Use of interactivity features (slicers, cross-filtering) to enhance user experience.

Attention to design and formatting for a professional presentation.

________________________________________

Conclusion

Python: The athletics datasets were cleaned, merged, and summarized using Python and pandas. Key tasks included handling missing values, fixing inconsistencies, merging datasets, filtering for final rounds, assigning medals, and summarizing medal counts by country. The output (country_medal_counts.csv) provided a clean, aggregated dataset for visualization.

Power BI: The summarized data was imported into Power BI, where an interactive dashboard was created. The dashboard includes key metrics, a pie chart, a treemap, bar charts, a table, and a map, providing a comprehensive view of medal counts by country with interactive filtering.

Impact:

• The Python activities ensured the data was clean, consistent, and structured for analysis.

• The Power BI dashboard effectively communicates the results, allowing stakeholders to explore medal distributions, compare countries, and identify top performers through an intuitive and interactive interface.

Overview

This project demonstrates a complete data pipeline for analyzing and visualizing farmers’ market sales data using SQL and Power BI. The dataset, contains 300 rows across four tables: Vendors (10 rows), Products (20 rows), Customers (50 rows), and Sales (220 rows). The goal was to clean, transform, and query the data in SQL, export it for Power BI, and create an interactive dashboard to showcase key insights.

Files

SQL Activities

Database Creation and Population

Schema Design:

Created a relational database with four tables:

Vendors: Stores vendor details (e.g., vendorid, vendorname, marketlocation).

Products: Contains product information (e.g., productid, productname,category, unitprice).

Customers: Tracks customer data (e.g., customerid, firstname, loyaltymember).

Sales: Records transactions (e.g., saleid, saledate, quantity, totalamount).

Data Population:

Generated 300 rows of synthetic data, including 10 vendors, 20 products, 50 customers, and 220 sales transactions, reflecting a realistic farmers’ market scenario.

Data Cleaning and Transformation

Duplicate Removal: Identified and removed duplicate entries in Sales usingDISTINCT and GROUP BY to ensure unique transactions.

Foreign Key Validation: Fixed invalid foreign keys (e.g., non-existent VendorID orProductID) with LEFT JOIN and UPDATE statements to align with referential integrity.

Data Standardization: Standardized saledate formats using DATE_FORMAT andensured consistent totalamount calculations with a computed column(totalamountcomputed = quantity * unitprice).

Advanced SQL Queries

Aggregation: Created a view to categorize sales by season, using CASE

WHEN EXTRACT(MONTH FROM SaleDate) IN (12, 1, 2) THEN 'Winter'

WHEN EXTRACT(MONTH FROM SaleDate) IN (3, 4, 5) THEN 'Spring'

WHEN EXTRACT(MONTH FROM SaleDate) IN (6, 7, 8) THEN 'Summer'

Top-N Analysis: Identified the top 5 products by quantity sold using GROUP BY and LIMIT 5, and top sales by vendor using GROUP BY and ORDER BY.

Files were imported into Power BI for visualization, enabling both raw data modeling and pre-aggregated analysis.

Power BI Activities

Data Model

Imported Loaded CSV files (vendors.csv, products.csv, customers.csv, sales.csv) into Power BI Desktop.

Relationships: Established a star schema with:

Sales as the fact table, linked to Vendors, Products, and Customers via many-to-one relationships on vendorid, productid, and customerid, respectively.

Relationships configured with single-direction filtering for optimal performance.

Model View: Confirmed relationships visually, ensuring data integrity for calculations and visuals.

DAX Measures

Created measures:

– Total Sales Count: CALCULATE(COUNT('Sales'[saleid]),

ALLSELECTED('Vendors'))

– Total Revenue: CALCULATE(SUM('Sales'[totalamount]),

ALLSELECTED('Vendors'))

– Total Quantity Sold: CALCULATE(SUM('Sales'[quantity]),

ALLSELECTED('Products'))

– Total Revenue: CALCULATE(SUM('Sales'[totalamount]),

ALLSELECTED('Products'))

Additional measures: Total Customers, Loyal Customers (count of loyaltymember = 'Yes'), and %LoyalCustomers.

Dashboard Design

Visuals:– Cards: Displayed Loyal Customers (25), Total Customers (50), Total Number of Sales (477), Total Quantity Sold (1318), and Total Revenue (4.54K).

Bar Charts: Showed top vendors by organic status (e.g., Happy Hens with 1032 sales) and top products by category (e.g., Dairy with 387 sales).

Line Chart: Illustrated quantity sold per month, with a steep rise from 173 (January) to 821 (April).

Treemap: Highlighted top products sold (e.g., Free-Range Eggs with 258 units, Basil with 133 units).

Pie Chart: Illustrated sales by payment method. With each method (Mobile, Card, Cash) having 159 sales each.

Formatting: Used a green color scheme to reflect the farmers’ market theme, with clear titles and legends.

Insights

Vendor Performance: Happy Hens led with 1032 sales, indicating strong organic product demand.

Category Trends: Dairy (387 sales) outperformed other categories, suggesting a market preference.

Sales Growth: Quantity sold increased significantly from January to April, peaking in April (821 units).

Customer Loyalty: 25 out of 50 customers were loyalty members, representing 50% engagement.

This project showcases:

SQL Proficiency: Database design, data cleaning, advanced querying, and export.

Power BI Expertise: Data modeling, DAX measure creation, and interactive dashboard development.

Business Insight: Ability to derive actionable insights (e.g., vendor performance, sales trends) for stakeholders. The report, SQL scripts, exported CSVs, Power BI file (.pbix), and dashboard screenshots are available for review, demonstrating a fullstack data analysis workflow.

Like this project

Completed work

Posted May 16, 2025

Processed athletics data to visualize Olympic medal counts by country using Python and Power BI.

Likes

Views

Timeline

Apr 30, 2025 - May 4, 2025

Olympic Medal Count Visualization for Paris 2024

Overview

SQL Activities

Power BI Activities

Join 50k+ companies and 1M+ independents