Global Power Plant Database Analysis

Uchenna

Uchenna Ejike

🌍 Global Power Plant Database Analysis

This project performs an exploratory data analysis (EDA) on the Global Power Plant Database, which includes details on 28,000+ power plants across the world. The goal is to uncover patterns, identify anomalies, and understand the global distribution and characteristics of power generation infrastructure.

πŸ“Š Project Overview

The notebook explores various dimensions of the power plant dataset including:
Geographic distribution of power plants
Primary and secondary fuel types used
Installed capacity trends (in MW)
Commissioning year analysis
Power generation trends (2013–2016)
Handling of missing and outlier values
Estimated annual generation

πŸ“ Dataset Summary

The dataset contains 28,664 rows and 22 columns. Each record represents a unique power plant.

πŸ”‘ Key Columns

Column Name Description country / country_long Two-letter ISO code and full name of the country name Name of the power plant capacity_mw Installed capacity (in megawatts) latitude, longitude Geographic coordinates fuel1, fuel2, fuel3, fuel4 Fuel types used commissioning_year Year the plant became operational generation_gwh_2013 to generation_gwh_2016 Actual power generated in GWh for each year estimated_generation_gwh Estimated generation if actuals are not available

πŸ§ͺ Key Findings

πŸ“ Capacity Insights

Capacities range from 1 MW to 22,500 MW, with a mean of 186 MW
75% of plants have capacities less than 100 MW, indicating skewness due to large plants

🌐 Geographical Anomalies

Some latitude/longitude values exceed real-world boundaries, signaling data entry issues

πŸ“… Commissioning Trends

Operational years range from 1896 to 2018
Data is missing for over 40% of plants in the commissioning_year column

⚑ Power Generation Data

Inconsistent availability: far fewer records for 2013 and 2014 than 2016
Negative and zero values detected in generation columns
Estimated generation shows extreme variance (0 to 92,268 GWh)

🧼 Data Quality Notes

Missing values in crucial columns (fuel, generation)
Outliers and negatives in numeric columns like generation_gwh_2016
Manual or programmatic data cleaning is recommended

πŸ“Œ EDA Highlights (What You’ll Find Inside)

Visualizations showing fuel type distribution and capacity by fuel
Temporal trends in commissioning and generation
Detection of missing values, outliers, and inconsistent coordinates
Use of summary statistics to uncover skewness in the data

πŸ›  Technologies Used

Python 3
Pandas, NumPy for data manipulation
Matplotlib, Seaborn, Plotly for visualizations
Jupyter Notebook as the development environment
Like this project

Posted Sep 19, 2025

Performed EDA on Global Power Plant Database to analyze power generation patterns and anomalies.