Breast Cancer Detection - Machine Learning

Isaac Somuah

0

Business Analyst

ML Engineer

Data Analyst

GitHub

pandas

Python

Breast Cancer Detection

Project Overview

Breast cancer is a significant health concern, with early detection being crucial for improving patient outcomes and reducing treatment costs. This project demonstrates the development and evaluation of three machine learning models aimed at accurately predicting whether a breast tumor is benign or malignant. By leveraging these models, healthcare providers can potentially make more informed decisions, leading to earlier interventions and better patient prognosis.

Business Value

The primary goal of this project is to provide a reliable, automated tool for breast cancer detection that can assist radiologists and oncologists in making quicker and more accurate diagnoses. Implementing this tool could lead to:
Reduced Diagnostic Time: Automating the initial screening process, allowing healthcare professionals to focus on more complex cases.
Improved Accuracy: Enhancing diagnostic precision, reducing the likelihood of false positives and negatives, and consequently improving patient trust and treatment outcomes.
Cost Savings: By identifying malignant tumors earlier, the treatment can be less invasive and more cost-effective, reducing the financial burden on patients and healthcare systems

Dataset

The dataset used is the Wisconsin Breast Cancer Dataset, which contains 569 samples and 32 features.
Link to Dataset - Wisconsin Breast Cancer Dataset
The target variable - diagnosis - indicates whether the tumor is benign (B) or malignant (M)

Key Data Preprocessing steps

Project Directory

data: Contains the dataset file.
notebooks: Jupyter Notebooks for data exploration, modeling, and evaluation.
scripts: Reusable Python code for model training and evaluation.
results: Exported images and results from model evaluations

Dependencies

This project was built using Python and the following key libraries:
pandas, numpy, matplotlib, seaborn, scikit-learn

Usage

Clone the repository.
Install required dependencies using pip install -r requirements.txt.
Run the Jupyter Notebook named notebooks/breast_cancer_detection.ipynb.

Key Models Employed

Logistic Regression
Random Forest
Support Vector Machine

Results and Validation

Using the Stratified cross-validation score, the SVC model outperformed the others with a score of 97.80%, making it the most reliable model for this dataset. This high accuracy suggests that the model is well-suited for initial breast cancer screening in a clinical setting.

Future Work and Recommendations

Hyperparameter Tuning: Experiment with different model hyperparameters to improve predictive accuracy.
Feature Engineering: Remove or combine features with high self-correlation and low correlation with the target variable to reduce overfitting and improve model interpretability.
Model Deployment: Integrate the model into a web-based application for real-time predictions, enabling broader access for healthcare providers.

Contributing

Contributions are welcome! Please follow these steps to contribute:
Fork the repository and create a new branch
Implement your changes and commit them with a clear message
Push to the branch and open a Pull Request for review.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Key Resources Used

Like this project
0

Posted Sep 9, 2024

Breast Cancer Detection - Machine Learning Project | Predicting the malignancy of a tumor based on certain features

Likes

0

Views

0

Tags

Business Analyst

ML Engineer

Data Analyst

GitHub

pandas

Python

Time Intelligence Analysis of Sales Data in Power BI
Time Intelligence Analysis of Sales Data in Power BI
Sales Analytics Dashboard & Forecast Tool
Sales Analytics Dashboard & Forecast Tool