Data Exploration and Cleaning Project

Fajar

Fajar Satria

Data Exploration and Cleaning

📌 Project Overview

This project focuses on exploring and cleaning datasets using Python (Pandas). The analysis includes:
Exploratory Data Analysis (EDA) on a Coffee Bean Sales Dataset.
Data Cleaning & Transformation on a Students Dataset.

📂 Datasets Used

Contains sales data for coffee over 2 years in Saudi Arabia.
Includes information on customers, product categories, and sales amount.
Contains student performance data.
Includes details like lunch category, reading score, and grade.

🔍 Analysis Performed

1️⃣ Coffee Bean Sales Analysis

Filtering data based on relevant columns.
Sorting by quantity to find best-selling products.
Grouping data by city to analyze total sales.
Additional insights: Average price per product & product category distribution.

2️⃣ Students Data Cleaning

Checking missing values and handling them appropriately:
Mode for categorical data (e.g., lunch category).
Mean for numerical data (e.g., reading score).
Median for ordinal data (e.g., grade).
Using alternative missing value handling techniques:
Forward fill & Backward fill
Interpolation
Dropping missing values

🛠️ Technologies Used

Python (Pandas, NumPy)
Jupyter Notebook / Google Colab

🚀 How to Run

Clone the repository:
git clone https://github.com/fajarwiguna/data-analysis-and-cleaning.git
Install dependencies (if required):
pip install pandas numpy
Open the Jupyter Notebook and run the analysis.

📜 License

This project is for educational purposes only. Feel free to modify and explore!
Like this project

Posted Jun 7, 2025

Explored and cleaned datasets using Python for data analysis.