Diabetes Prediction Using Machine Learning

Arun Kumar

Data Analyst

ML Engineer

Python

scikit-learn

Visual Studio Code

This project utilizes machine learning to create a diabetes detection model. The dataset, loaded from a CSV file, undergoes thorough preprocessing steps. Duplicate rows are removed, descriptive statistics are generated, and visualizations like pairplots are created. Features with zero values are replaced with means, outliers are removed, and unwanted features dropped. The dataset is oversampled using SMOTE to address class imbalance.

Stratified K-Fold ensures a balanced split into training and testing sets. A RandomForestClassifier is trained on the preprocessed and oversampled data, achieving an impressive 100% accuracy on the testing set. Feature importance is analyzed, and additional visualizations, like pairplots and correlation matrices, provide insights. The ROC curve and AUC score assess the model's performance.

While achieving perfect accuracy is noteworthy, caution is advised. It's crucial to thoroughly investigate potential issues like overfitting and data leakage, ensuring the robustness of the model's performance. Considerations include dataset characteristics, size, and a comprehensive understanding of the data and model behavior.

Like this project

Posted Feb 4, 2024

Built a diabetes detection model using ML with preprocessing, SMOTE, and RandomForest. Achieved 100% accuracy, but caution advised for overfitting and data leak

Likes

Views

Diabetes Prediction Using Machine Learning

Join 50k+ companies and 1M+ independents