

metrics.py — a reusable evaluation module with three functions:compute_core_metrics(y_true, y_pred, y_prob)evaluate_threshold_range(y_true, y_prob)find_optimal_threshold(y_true, y_prob, metric='f1')data_loader.py with a single function, load_and_prepare_data(), that handles:Time column (a recording artifact, not a transaction property)Amount column to match the scale of PCA-transformed features V1–V28class_weight='balanced'. This establishes the performance floor. Every subsequent model must beat it.max_depth=10.scale_pos_weight=577 (the legit-to-fraud ratio) instead of SMOTE, handling imbalance natively. Configuration: 200 trees, max_depth=6, learning_rate=0.1.1.00 for Logistic Regression, 0.85 for Random Forest, 0.35 for XGBoost) and by business priority.find_optimal_threshold() function supports both by parameterising the target metric.random_state=42. The stratified split preserves class ratios. The same data pipeline runs identically every time. Results are reproducible without notebooks or manual steps.metrics.py and data_loader.py pattern applies to:Posted Mar 25, 2026
Developed ML evaluation infrastructure for improved fraud detection in fintech, boosting model performance by 21%.
0
0
Mar 15, 2026 - Mar 22, 2026