Fraud Detection on Credit Card Transactions

hussein

hussein hafez

Fraud Detection on Credit Card Transactions

A machine learning pipeline to detect fraudulent credit card transactions using imbalanced classification techniques.
Built with scikit-learn, this project demonstrates a complete workflow from preprocessing to model tuning and evaluation — focused on real-world constraints like class imbalance, interpretability, and performance metrics.

Problem Overview

Goal: Classify transactions as fraudulent or legitimate.
Challenge: Only ~0.17% of transactions are fraud — a highly imbalanced binary classification problem.

Technologies & Libraries

Python 3.9
scikit-learn
imbalanced-learn (SMOTE)
matplotlib / seaborn
pandas / numpy

ML Pipeline Summary

Data Preprocessing

Standardized features using StandardScaler
Stratified train/test split to preserve class imbalance

Handling Class Imbalance

Applied SMOTE (Synthetic Minority Over-sampling) to balance the dataset before training

Model & Training

Used Random Forest as the primary classifier
Performed hyperparameter tuning with GridSearchCV (5-fold, ROC-AUC scoring)

Evaluation

ROC-AUC Score: Measured on test set
Precision-Recall Curve: Emphasizes performance on rare fraud cases
Confusion Matrix: Visualizes false positives/negatives
Classification Report: Full precision, recall, F1 breakdown

Feature Importance

Visualized key fraud indicators using feature_importances_ from the trained model

Results

| Metric | Value |
|---------------|-----------|
| ROC-AUC Score | ~0.97 |
| PR AUC | High separation between classes
| Top Features | V14, V10, V17, V12

What This Project Demonstrates

Ability to build real-world ML pipelines end-to-end
Knowledge of class imbalance handling using SMOTE
Comfort with evaluation beyond accuracy (PR, ROC, Confusion Matrix)
Awareness of model interpretability with feature importance

How to Run

Clone the repo
git clone https://github.com/husseinhafez1/Fraud-Detection-on-Credit-Card-Transactions
cd fraud-detection-ml
Install dependencies
pip install -r requirements.txt
Launch notebook
jupyter notebook
Run fraud_detection_pipeline.ipynb

File Structure


fraud-detection-ml/
├── data/
│ └── creditcard.csv # Dataset (Kaggle)

├── notebooks/
│ └── fraud_detection_pipeline.ipynb # Main notebook

├── src/
│ └── preprocessing.py # (Optional) data processing logic

├── fraud_detection_pipeline.html # HTML export of notebook
├── requirements.txt # Dependencies
└── README.md # Project description


Like this project

Posted Aug 16, 2025

Developed a fraud detection ML pipeline for credit card transactions.