Executive Summary
Overall model accuracy and which signal characteristics best identify anomalies
A Random Forest classifier achieved 96.4% accuracy (AUC 0.990) in detecting 413 anomalies across 2000 telemetry segments. The model demonstrates strong sensitivity (87.0%) with acceptable specificity (99.0%), balanced by F1-score of 0.913. The strongest anomaly discriminator is peak_count (importance 97.023), suggesting signal-level characteristics are primary indicators.
Analysis Overview
Renderer-generated
Data Quality Assessment
Renderer-generated
Anomaly Distribution
Class balance between anomalous and normal segments
Out of 2000 telemetry segments, 413 (20.6%) are anomalies while 1587 (79.3%) are normal operations. The dataset shows class imbalance, with anomalies representing the minority class. This distribution affects model training and inference thresholds.
Feature Correlation Matrix
Pairwise feature correlations; strongest relationships indicate redundant information
Feature correlations range from -0.785 to 0.998, with the strongest pairwise relationship being 0.998. Most feature pairs show weak to moderate correlation (|r| < 0.5), suggesting features capture distinct signal properties. Strong correlations with anomaly label indicate predictive signal characteristics.
Feature Importance Ranking
Top 10 features ranked by importance (mean decrease in Gini); shows which signal characteristics best discriminate anomalies
The top 10 features contributing to anomaly detection span statistical moments (mean, variance, kurtosis), structural patterns (peak counts, smoothed peaks), and normalized energy metrics. Peak/Structural features dominate (97.023 importance), with the single most important feature being 'Peak Count'. This suggests peak/structural characteristics are the primary anomaly discriminators.
ROC Curve Analysis
True positive rate vs false positive rate across all classification thresholds (AUC measures discriminative power)
The ROC curve exhibits strong discriminative power with AUC = 0.990, indicating the classifier ranks anomalies higher than normal segments 99.9% of the time on average. The curve bows well above the random baseline (diagonal), demonstrating the model's ability to trade off sensitivity and specificity. At any threshold, operators can choose operating points balancing missed anomalies against false alarms.
Confusion Matrix
Actual vs predicted labels; diagonal (correct predictions) vs off-diagonal (errors)
The confusion matrix shows 4 false positives (1.0% of normal segments incorrectly flagged) and 14 false negatives (13.0% of true anomalies missed). The model prioritizes sensitivity (minimizing missed anomalies) over strict precision, suitable for safety-critical anomaly detection where missing an anomaly carries higher risk than false alarms.
Model Performance Metrics
Summary of key classification metrics on test set
| Metric | Value |
|---|---|
| Test Set Size | 501 |
| Accuracy | 96.41% |
| Precision | 95.92% |
| Recall (Sensitivity) | 87.04% |
| F1-Score | 0.9126 |
| Specificity | 98.98% |
| AUC (ROC) | 0.9897 |
The Random Forest classifier achieves 96.41% accuracy on the holdout test set with F1-score of 0.9126. Precision (95.92%) and Recall (87.04%) are well-balanced, enabling reliable anomaly detection without excessive false alarms. The model demonstrates strong discriminative power with ROC AUC of 0.9897.