Diabetes Prediction Using Machine Learning

Arun Kumar

Data Analyst
ML Engineer
Python
scikit-learn
Visual Studio Code
This project utilizes machine learning to create a diabetes detection model. The dataset, loaded from a CSV file, undergoes thorough preprocessing steps. Duplicate rows are removed, descriptive statistics are generated, and visualizations like pairplots are created. Features with zero values are replaced with means, outliers are removed, and unwanted features dropped. The dataset is oversampled using SMOTE to address class imbalance.
Stratified K-Fold ensures a balanced split into training and testing sets. A RandomForestClassifier is trained on the preprocessed and oversampled data, achieving an impressive 100% accuracy on the testing set. Feature importance is analyzed, and additional visualizations, like pairplots and correlation matrices, provide insights. The ROC curve and AUC score assess the model's performance.
While achieving perfect accuracy is noteworthy, caution is advised. It's crucial to thoroughly investigate potential issues like overfitting and data leakage, ensuring the robustness of the model's performance. Considerations include dataset characteristics, size, and a comprehensive understanding of the data and model behavior.
Partner With Arun
View Services

More Projects by Arun