Data Modeling: Clustering COVID-19 Cases in Jakarta

Angelin

Data Modelling Analyst
Data Visualizer
Data Analyst
Jupyter
Matplotlib
Python

Business Understanding

- Analyze COVID-19 cases in DKI Jakarta to identify geographic and demographic patterns affecting infection rates.

- Evaluate compliance with health protocols, including isolation, hospitalization, and contact tracing.

- Assess the effectiveness of prevention and treatment efforts at the neighborhood level.

- Evaluate the capacity of the healthcare system to handle the surge in cases.

Data Understanding

- The dataset includes COVID-19 cases in DKI Jakarta from October 2020.

- It includes variables related to health conditions, such as suspected, probable, and confirmed cases.

- The dataset also includes information on the outcomes of these cases, such as recovered, hospitalized, or deceased.

- It includes location information, including province, city, district, and neighborhood.

Data Preparation

- Initial data inspection using Pandas to understand the dataset.

- Check for missing values using heatmap visualization with Seaborn and Matplotlib.

- Clean the dataset by removing irrelevant columns and duplicates.

- Ensure data uniqueness by removing duplicate rows.

Feature Scaling

- Identify and handle outliers in numerical columns using Interquartile Range (IQR).

- Normalize data using StandardScaler from Scikit-Learn to make it suitable for machine learning algorithms.

Modeling

- Use K-Means and Hierarchical Clustering algorithms to identify patterns in the data.

- K-Means is used to determine the optimal number of clusters.

- Hierarchical Clustering is used to visualize the relationships between clusters.

Validation and Evaluation

- Validate the clustering model using silhouette score and Davies-Bouldin score.

- Evaluate the model's performance using these scores for each cluster.

- Ensure the model is optimal and can make accurate predictions.

Deployment

- Implement the clustering model in a system to support decision-making.

- Integrate the model with healthcare monitoring systems to identify patterns in COVID-19 spread.

- Use the model to allocate medical resources more effectively and inform public health policies.

- Continuously evaluate the model to ensure it remains accurate and relevant.

Here are the images of the results and evaluation of the model:

Partner With Angelin
View Services

More Projects by Angelin