Finance · Transactions · Fraud Eda P1778698833
Executive Summary

Executive Summary

Key findings from fraud vs legitimate transaction analysis

Transactions Analyzed
2000
Fraudulent Cases
216
Fraud Rate (%)
10.8
Mean Fraud Amount ($)
115.3
Mean Legitimate Amount ($)
73.82
Analyzed 2000 transactions with 216 fraudulent cases (10.8% fraud rate). Fraudulent transactions have mean amount $115.30 vs legitimate $73.82, suggesting amount patterns differ by class. PCA features and temporal signals show distinct patterns between fraud and legitimate activity, warranting further investigation of high-variance features and time-based anomalies.
Interpretation

Analyzed 2000 transactions with 216 fraudulent cases (10.8% fraud rate). Fraudulent transactions have mean amount $115.30 vs legitimate $73.82, suggesting amount patterns differ by class. PCA features and temporal signals show distinct patterns between fraud and legitimate activity, warranting further investigation of high-variance features and time-based anomalies.

Overview

Analysis Overview

Dataset overview and analysis scope

N Observations2000
N Fraudulent216
N Legitimate1784
Fraud Rate Pct10.8
Mean Amount Fraud115.3
Mean Amount Legitimate73.82
Interpretation

This analysis examined 2000 credit card transactions with 216 fraudulent cases (10.8% fraud rate). Data includes 28 PCA-transformed features, transaction amounts, timestamps, and fraud labels for comprehensive class-wise comparison.

Data Preparation

Data Quality

Data quality assessment and preprocessing summary

N Observations2000
N Fraudulent216
N Legitimate1784
Fraud Rate Pct10.8
Mean Amount Fraud115.3
Mean Amount Legitimate73.82
Interpretation

No missing values detected in the 2000 analyzed transactions. All 28 PCA features are numeric and properly scaled. Transaction amounts and timestamps are complete. Data quality is suitable for exploratory analysis without additional preprocessing.

Visualization

Fraud vs Legitimate Count

Distribution of fraudulent and legitimate transactions

Interpretation

Out of 2000 total transactions, 1784 are legitimate (89.2%) and 216 are fraudulent (10.8%). The severe class imbalance toward legitimate transactions (98%) is typical for fraud detection datasets and indicates that legitimate transactions vastly outnumber fraudulent ones.

Visualization

Transaction Amount by Class

Box plot showing distribution of transaction amounts for each fraud class

Interpretation

Fraudulent transactions have median amount $11.86 (IQR $104.92) vs legitimate $22.00 (IQR $67.89). The lower median and wider interquartile range for fraudulent transactions suggest lower-value spending patterns for fraudulent activity.

Visualization

Amount Distribution (Density)

Violin plot showing the density and shape of transaction amount distributions

Interpretation

The density distributions reveal that fraudulent transactions are right skewed (mean-median = 103.45) while legitimate transactions are right skewed (mean-median = 51.82). Fraudulent amounts concentrate at lower values, suggesting fraudsters target specific amount ranges.

Visualization

Mean PCA Features by Class

Comparison of mean values for top PCA features by fraud class

Interpretation

The top 10 PCA features show the largest mean differences between fraudulent and legitimate transactions. pca_feature_3 shows the largest discriminative power with absolute difference of 7.366, indicating strong separation between fraud and legitimate patterns. These features are candidates for fraud detection models.

Visualization

Fraud Rate Over Time

Heatmap showing fraud rates across time periods (Early, Middle, Late phases)

Interpretation

Fraud rates vary across different time periods within the transaction dataset. Peak fraud rate of 10.8% occurs during the early phase, indicating potential temporal clustering of fraudulent activity. This suggests that fraudsters may target specific times when detection is less likely or transaction monitoring is reduced.

Visualization

Feature Correlations

Heatmap of Pearson correlations between all features and fraud label

Interpretation

The correlation matrix reveals which features are most strongly associated with fraud. pca_feature_14 shows the strongest correlation with the fraud label (r = -0.805), making it a key discriminative feature. Multicollinearity between PCA features is minimal due to their orthogonal construction from PCA.

Data Table

Summary Statistics by Class

Mean, median, and standard deviation for top PCA features by fraud class

StatisticFeature NameFraudulent ValueLegitimate Value
Meanpca_feature_3-7.382-0.0162
Meanpca_feature_14-7.085-0.0184
Meanpca_feature_17-7.020.0369
Meanpca_feature_12-6.4320.027
Meanpca_feature_7-6.109-0.0253
Meanpca_feature_10-5.9490.0166
Meanpca_feature_1-5.1680.0386
Meanpca_feature_44.64-0.0157
Meanpca_feature_16-4.3160.0006
Meanpca_feature_113.936-0.0093
Medianpca_feature_3-5.1390.1602
Medianpca_feature_14-6.7970.0386
Medianpca_feature_17-5.756-0.0486
Medianpca_feature_12-5.5030.1526
Medianpca_feature_7-3.1610.0363
Medianpca_feature_10-4.698-0.097
Medianpca_feature_1-2.4930.0738
Medianpca_feature_44.223-0.0143
Medianpca_feature_16-3.8380.0424
Medianpca_feature_113.712-0.0255
Std Devpca_feature_37.3771.405
Std Devpca_feature_144.130.936
Std Devpca_feature_177.0560.7564
Std Devpca_feature_124.6390.9249
Std Devpca_feature_77.620.9975
Std Devpca_feature_105.0611.044
Std Devpca_feature_17.1981.895
Std Devpca_feature_42.8691.346
Std Devpca_feature_163.8840.8168
Std Devpca_feature_112.6031.001
Interpretation

Summary statistics for the top 10 most discriminative PCA features reveal systematic differences between fraudulent and legitimate transactions. Fraudulent cases show greater variability (higher std dev) in many features, and lower central tendency (mean/median) patterns compared to legitimate transactions. These differences form the basis for anomaly detection approaches.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing