Angelin
Business Understanding
- Analyze COVID-19 cases in DKI Jakarta to identify geographic and demographic patterns affecting infection rates.
- Evaluate compliance with health protocols, including isolation, hospitalization, and contact tracing.
- Assess the effectiveness of prevention and treatment efforts at the neighborhood level.
- Evaluate the capacity of the healthcare system to handle the surge in cases.
Data Understanding
- The dataset includes COVID-19 cases in DKI Jakarta from October 2020.
- It includes variables related to health conditions, such as suspected, probable, and confirmed cases.
- The dataset also includes information on the outcomes of these cases, such as recovered, hospitalized, or deceased.
- It includes location information, including province, city, district, and neighborhood.
Data Preparation
- Initial data inspection using Pandas to understand the dataset.
- Check for missing values using heatmap visualization with Seaborn and Matplotlib.
- Clean the dataset by removing irrelevant columns and duplicates.
- Ensure data uniqueness by removing duplicate rows.
Feature Scaling
- Identify and handle outliers in numerical columns using Interquartile Range (IQR).
- Normalize data using StandardScaler from Scikit-Learn to make it suitable for machine learning algorithms.
Modeling
- Use K-Means and Hierarchical Clustering algorithms to identify patterns in the data.
- K-Means is used to determine the optimal number of clusters.
- Hierarchical Clustering is used to visualize the relationships between clusters.
Validation and Evaluation
- Validate the clustering model using silhouette score and Davies-Bouldin score.
- Evaluate the model's performance using these scores for each cluster.
- Ensure the model is optimal and can make accurate predictions.
Deployment
- Implement the clustering model in a system to support decision-making.
- Integrate the model with healthcare monitoring systems to identify patterns in COVID-19 spread.
- Use the model to allocate medical resources more effectively and inform public health policies.
- Continuously evaluate the model to ensure it remains accurate and relevant.
Here are the images of the results and evaluation of the model: