The increasing prevalence of credit card fraud poses a significant threat to
financial institutions and their customers. This project aims to develop a
machine learning-based credit card fraud detection system to identify and
prevent fraudulent transactions thereby protecting the interests of both
financial institutions and their customers.
### Introduction
This is a simulated credit card transaction dataset containing legitimate and fraud transactions from the duration 1st Jan 2019 - 31st Dec 2020. It covers credit cards of 1000 customers doing transactions with a pool of 800 merchants.
#### Source of Simulation
This was generated using Sparkov Data Generation | Github tool created by Brandon Harris. This simulation was run for the duration - 1 Jan 2019 to 31 Dec 2020. The files were combined and converted into a standard format.
It contains trans_date_trans_time: Date and time of the transaction.
cc_num: Credit card number
merchant: Name or identifier of the merchant involved in the transaction.
category: Category or type of the transaction.
amt: Transaction amount.
first: First name of the cardholder.
last: Last name of the cardholder.
gender: Gender of the cardholder.
street: Street address of the cardholder.
city: City of the cardholder's address.
state: State or region of the cardholder's address.
zip: ZIP code of the cardholder's address.
lat: Latitude of the cardholder's location.
long: Longitude of the cardholder's location.
city_pop: Population of the city where the cardholder resides.
job: Occupation or job title of the cardholder.
dob: Date of birth of the cardholder.
trans_num: Transaction number or identifier.
unix_time: Transaction time in Unix timestamp format.
merch_lat: Latitude of the merchant's location.
merch_long: Longitude of the merchant's location.
is_fraud: Binary indicator (0 or 1) to denote whether the transaction is fraudulent (1) or not (0).
Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.