Diabetes Prediction Using Machine Learning

Arun Kumar

0

Data Analyst

ML Engineer

Python

scikit-learn

Visual Studio Code

This project utilizes machine learning to create a diabetes detection model. The dataset, loaded from a CSV file, undergoes thorough preprocessing steps. Duplicate rows are removed, descriptive statistics are generated, and visualizations like pairplots are created. Features with zero values are replaced with means, outliers are removed, and unwanted features dropped. The dataset is oversampled using SMOTE to address class imbalance.
Stratified K-Fold ensures a balanced split into training and testing sets. A RandomForestClassifier is trained on the preprocessed and oversampled data, achieving an impressive 100% accuracy on the testing set. Feature importance is analyzed, and additional visualizations, like pairplots and correlation matrices, provide insights. The ROC curve and AUC score assess the model's performance.
While achieving perfect accuracy is noteworthy, caution is advised. It's crucial to thoroughly investigate potential issues like overfitting and data leakage, ensuring the robustness of the model's performance. Considerations include dataset characteristics, size, and a comprehensive understanding of the data and model behavior.
Like this project
0

Built a diabetes detection model using ML with preprocessing, SMOTE, and RandomForest. Achieved 100% accuracy, but caution advised for overfitting and data leak

Likes

0

Views

0

Tags

Data Analyst

ML Engineer

Python

scikit-learn

Visual Studio Code

Arun Kumar

Data Science & ML Expert 📊🤖✨

 Sentiment Analysis
Sentiment Analysis
Language Translator
Language Translator