Executive Summary
Key fraud detection performance metrics and model recommendations
Random Forest achieved the best balance with F1 score of 0.934 and AUC of 0.973 on the held-out test set of 599 transactions. Fraud recall (sensitivity) of 89.1% means the model caught 89.1% of fraudulent transactions, critical for minimizing fraud loss. The original dataset showed 10.80% fraud rate; after SMOTE balancing the training set to 10.8%, all models were less biased toward predicting legitimate transactions.
Fraud Class Imbalance
Distribution of fraudulent vs legitimate transactions in the original dataset
The original dataset contains 2000 transactions with only 216 frauds (10.80% fraud rate), illustrating severe class imbalance typical of real-world fraud detection problems. This imbalance is why SMOTE oversampling is essential during training—without it, models would achieve high accuracy by simply predicting 'legitimate' for every transaction.
Feature Correlations with Fraud
Top 10 principal components ranked by correlation strength with fraud label
The strongest correlation with fraud is -0.805 (pc_14), indicating this component is the single most predictive feature. Correlations range from -0.805 to 0.693, showing that fraud patterns are distributed across multiple components rather than concentrated in one. This validates the use of ensemble methods that can capture non-linear combinations of these features.
Model Performance Comparison
Accuracy, precision, recall, and F1 scores for three classification algorithms on the test set
| Model Name | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Logistic Regression | 0.98 | 0.9333 | 0.875 | 0.9032 |
| Random Forest | 0.9866 | 0.9828 | 0.8906 | 0.9344 |
| XGBoost | 0.9866 | 0.9828 | 0.8906 | 0.9344 |
Random Forest achieved the best F1 score of 0.9344, balancing precision (0.9828) and recall (0.8906). Recall is critical for fraud detection because missing frauds (false negatives) is more costly than false alarms. All three models show reasonable performance, with Random Forest offering the best trade-off between catching fraud and minimizing false positives.
Feature Importance (Top Features)
Mean Decrease in Gini importance for top 10 principal components from Random Forest model
pc_14 is the most important feature (importance 49.706), followed by pc_10 (36.574). These top features should be prioritized in future data collection and model monitoring. The importance distribution shows that fraud detection relies on an ensemble of many features rather than a single dominant predictor, justifying the use of non-linear models.
ROC Curve (Best Model)
Trade-off between true positive rate (fraud detection) and false positive rate across all classification thresholds
The ROC curve shows AUC of 0.973 for Random Forest, indicating strong discrimination ability between fraudulent and legitimate transactions. At a false positive rate of 5%, the model achieves ~93.8% fraud detection (TPR). This curve enables threshold selection based on business priorities: lower threshold for aggressive fraud prevention (high recall), higher threshold to minimize false alarms.
Confusion Matrix
Breakdown of correct and incorrect predictions for fraudulent and legitimate transactions
Out of 599 test transactions, the model correctly classified 534 legitimate transactions (specificity 99.8%) and 57 frauds (sensitivity 89.1%). There were 1 false positives (legitimate flagged as fraud) and 7 false negatives (fraud missed). The 7 missed frauds are the most costly errors in practice.
Final Model Metrics
Summary of key performance metrics and deployment-ready threshold recommendations
| Metric Name | Metric Value |
|---|---|
| AUC-ROC | 0.9734 |
| Accuracy | 0.9866 |
| Balanced Accuracy | 0.9444 |
| Sensitivity (Recall) | 0.8906 |
| Specificity | 0.9981 |
| Precision | 0.9828 |
| F1 Score | 0.9344 |
| Deployment Threshold (80% Fraud Recall) | 0.785 |
The Random Forest model achieves AUC 0.973, indicating strong ability to discriminate frauds from legitimate transactions. At the deployment threshold of 0.79 (optimized for 80% fraud detection), sensitivity is 89.1% and specificity is 99.8%. Balanced accuracy of 94.4% accounts for both fraud detection and false alarm minimization. These metrics qualify the model for production deployment.