This project aims to provide a comparative analysis of heart disease prediction using various machine learning models. The focus is on cleaning and preprocessing data, model training, and evaluation.
In this section, we load the dataset and perform necessary preprocessing steps such as handling missing values and encoding categorical variables:
import pandas as pd
import numpy as np
df = pd.read_csv('/kaggle/input/heart-failure-prediction/heart.csv')
df.head()
Model Training
We use several models including K-Nearest Neighbors, Decision Trees, and Random Forests for training:
K-Nearest Neighbors (KNN)
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
Decision Tree
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
Random Forest
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
Evaluation
After training the models, we evaluate their performance using metrics such as accuracy and confusion matrix:
KNN Evaluation
from sklearn.metrics import accuracy_score, confusion_matrix
knn_pred = knn.predict(X_test)
print(f"Accuracy of KNN: {accuracy_score(y_test, knn_pred)}")
Conclusion
This approach allows for a better understanding of which models perform optimally for heart disease prediction, enabling practitioners to utilize the most effective strategies in real-world applications.
Like this project
Posted May 11, 2026
Heart disease prediction analysis using various ML models. Evaluated multiple algorithms for optimal results.