joash omondi
Exploratory Data Analysis (EDA) for Healthcare Appointments
Overview
This repository contains Python code for Exploratory Data Analysis (EDA) on a healthcare dataset related to patient appointments. The analysis aims to explore various factors influencing patient attendance at appointments.
Dataset
The dataset used for this analysis is available in the file data set for eda.csv
. It includes information about patient appointments, such as appointment dates, patient characteristics, and whether the patient showed up for the appointment (NoShow
).
Libraries Used
pandas
numpy
datetime
matplotlib
seaborn
Key Steps in the Analysis
Data Loading and Initial Exploration:
Data Preprocessing:
Data Visualization:
Data Cleaning:
Correlation Analysis:
Bivariate Analysis:
Findings
Female patients have taken more appointments than male patients.
The show rate is almost equal for age groups, except Age 0 and Age 1, with an 80% show rate for each age group.
Each neighborhood has an almost 80% show rate.
Patients without a scholarship have a higher show rate (around 80%) compared to those with a scholarship (around 75%).
Patients without hypertension have a show rate of around 78%, while patients with hypertension have a higher show rate of around 85%.
Patients without diabetes have a show rate of around 80%, while patients with diabetes have a higher show rate of around 83%.
Patients who have not received an SMS have a higher show rate (around 84%) compared to those who have received an SMS (around 72%).
There are no appointments on Sundays, and appointments on Saturdays are significantly fewer than on other weekdays.
Suggestions for Further Analysis
Time-series analysis to identify patterns in appointment attendance over time.
Feature engineering and additional data exploration for more insights.
Building predictive models to forecast appointment no-shows.
Feel free to explore the Jupyter notebook for a detailed step-by-step analysis.