Overview

Analysis Overview

Random Forest Configuration

Analysis overview and configuration

Configuration

Analysis TypeRandom Forest
CompanyTest Company
ObjectivePredict customer churn and identify key drivers using Random Forest ensemble model
Analysis Date2026-03-14
Processing Idanalytics__ml__ensemble__random_forest_test_20260314_213230
Total Observations500

Module Parameters

ParameterValue_row
n_trees300n_trees
task_typeautotask_type
Random Forest analysis for Test Company

Interpretation

Purpose

This analysis applies a Random Forest ensemble classifier to predict customer churn and identify the key drivers influencing churn decisions. The model uses 300 decision trees across 8 customer features to achieve robust classification performance while simultaneously ranking feature importance to guide business strategy.

Key Findings

  • OOB Accuracy: 88.8% (11.2% miss rate) — The out-of-bag error estimate indicates strong generalization performance without requiring a separate test set
  • Top Driver: support_tickets dominates with importance score 57.58 (100% relative importance) — Customer support interactions are the strongest churn predictor
  • Feature Hierarchy: tenure_months (75.4%) and monthly_charges (60.4%) rank second and third, showing tenure and pricing significantly influence churn
  • Model Stability: OOB error converges by ~100 trees, suggesting 300 trees provides stable, reliable predictions

Interpretation

The Random Forest model successfully identifies support ticket volume as the dominant churn signal, with longer tenure and higher charges providing secondary predictive power. The 88.8% accuracy demonstrates the model captures meaningful patterns in the 500-customer dataset. The convergence of out-of-bag scores demonstrates the ensemble has learned stable decision boundaries, making predictions reliable for unseen customers.

Context

As a black-box ensemble, the model sacrif

Data Preparation

Data Preprocessing

Data Quality & Completeness

Data preprocessing and column mapping

Data Quality

Initial Rows500
Final Rows500
Rows Removed0
Retention Rate100

Data Quality

MetricValue
Initial Rows500
Final Rows500
Rows Removed0
Retention Rate100%
Processed 500 observations, retained 500 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data cleaning and preparation phase for the Random Forest churn prediction model. Perfect data retention (100%) indicates that no observations were removed during preprocessing, meaning all 500 customer records proceeded to model training. This is critical for understanding whether the model's 88.8% accuracy reflects performance on a complete, unfiltered dataset or if data quality issues were masked by removal decisions.

Key Findings

  • Retention Rate: 100% (500/500 rows) - No observations were excluded during preprocessing, suggesting either excellent initial data quality or minimal validation criteria applied
  • Rows Removed: 0 - The metadata notes that "rows with NA in all features removed," yet zero removals occurred, indicating no complete-case failures in the dataset
  • Train/Test Split: Not specified (N/A) - The model used out-of-bag (OOB) error estimation instead of explicit holdout validation, which is standard for Random Forest but limits external validation

Interpretation

The 100% retention rate supports the model's reliability for the stated churn prediction objective, as the full customer base was available for training. However, the absence of an explicit train/test split means performance metrics rely entirely on OOB estimates (11.2% error rate). This approach is valid but doesn't demonstrate generalization to truly unseen data. The lack of documented missing value handling or

Executive Summary

Executive Summary

Key Findings & Recommendations

Key Metrics

total_observations
500
n_features
8
oob_error_rate
0.112
oob_accuracy_pct
88.8
r_squared
NA
top_feature
support_tickets

Key Findings

FindingValue
Model TypeRandom Forest Classification (300 trees)
PerformanceOOB Accuracy: 88.8%
Performance RatingGood
Top Driversupport_tickets
Features Used8 predictor variables
Training Size500 observations

Summary

Bottom Line: A Random Forest classification model was built with 300 trees on 500 observations using 8 predictor variables.

Performance: OOB Accuracy: 88.8% (Good)

Key Findings:
• Top predictor: 'support_tickets' has the highest influence on predictions
• OOB out-of-bag score provides a built-in, unbiased generalization estimate
• Feature importance reveals which variables drive the churned most

Recommendation: Model performance is satisfactory. Focus on the top features identified for business insights.

Interpretation

Purpose

This analysis evaluates a Random Forest classification model built to predict customer churn and identify key drivers. The model's performance and feature importance rankings directly address the business objective of understanding which factors most influence churn behavior, enabling targeted retention strategies.

Key Findings

  • OOB Accuracy: 88.8% – The model correctly predicts churn status in nearly 9 of 10 cases, indicating strong discriminative power for a binary classification task
  • Top Predictor: support_tickets (importance score 57.58, 100% relative weight) – Customer support interactions are the dominant churn signal, substantially outweighing other factors
  • Feature Hierarchy: tenure_months (75.4%) and monthly_charges (60.4%) rank second and third, suggesting tenure stability and pricing sensitivity also matter significantly
  • Model Stability: OOB error rate of 11.2% provides an unbiased generalization estimate without requiring separate test data

Interpretation

The model successfully achieves the stated objective: identifying support_tickets as the primary churn driver with 88.8% accuracy. The out-of-bag validation mechanism confirms this performance is not artificially inflated. The clear feature ranking—with support_tickets commanding 57.58 importance points versus 43.44 for tenure—reveals that customer support engagement patterns are substantially more predictive than

Figure 4

Feature Importance

Variable Importance Rankings

Feature importance rankings showing which variables drive predictions most

Interpretation

Purpose

This section identifies which variables most strongly drive the Random Forest model's churn predictions by measuring their contribution to reducing impurity across all 300 trees. Understanding feature importance reveals the key behavioral and account characteristics that distinguish churners from retained customers, directly supporting the stated objective to "identify key drivers" of churn.

Key Findings

  • Support Tickets: Importance score 57.58 (100% of max) — dominates prediction accuracy, indicating customer support interactions are the strongest churn signal
  • Tenure & Monthly Charges: Combined importance of 78.2% — customer longevity and pricing are secondary but substantial drivers
  • Importance Distribution: Sharp decline from rank 1 to rank 8 (57.58 → 6.43), with mean importance of 26.64, showing unequal predictive contribution across features
  • Lower-Ranked Features: Contract length (11.2%) and number of products (15.2%) contribute minimally to churn prediction

Interpretation

The model identifies support ticket volume as overwhelmingly predictive of churn—customers who file more support tickets are more likely to churn. This aligns with the 88.8% OOB accuracy, suggesting the model reliably captures churn patterns. The steep importance gradient indicates that a small subset of features (top 3) account for most predictive power

Figure 5

OOB Convergence

Miss Rate vs Number of Trees

OOB convergence — shows how model performance stabilizes as trees are added

Interpretation

Purpose

This section demonstrates how the Random Forest model's out-of-bag error rate stabilizes as additional trees are added to the ensemble. OOB convergence is critical for validating that the model has grown enough trees to achieve reliable, stable predictions without overfitting—directly supporting the churn prediction objective.

Key Findings

  • Initial OOB Error Rate: 23–24% miss rate (trees 1–4) — high variability with few trees
  • Final OOB Error Rate: 11.2% miss rate at 300 trees — represents 88.8% accuracy
  • Convergence Pattern: Sharp decline from early trees (1–50), then plateau by tree 100 onward, indicating stability achieved well before 300 trees

Interpretation

The model demonstrates strong convergence behavior, with the OOB error rate dropping from 24% to 11.2% as trees accumulate. The flattening curve after approximately 100 trees indicates that additional trees provide minimal performance gains, suggesting the ensemble has captured the underlying patterns in customer churn drivers. This stability validates the 300-tree configuration as sufficient for reliable out-of-sample predictions.

Context

OOB error serves as an unbiased performance estimate without requiring a separate test set. The low final miss rate (11.2%) aligns with the model's overall

Figure 6

Confusion Matrix

Actual vs Predicted Classifications

Confusion matrix: actual vs predicted classifications

Interpretation

Purpose

This confusion matrix displays the Random Forest model's classification performance on customer churn prediction, comparing actual churn outcomes against predicted classifications. It reveals both training-set performance and the more realistic out-of-bag (OOB) generalization accuracy, which indicates how well the model will perform on unseen data in production.

Key Findings

  • Training Accuracy: 100% - The model perfectly classifies all 500 training observations, indicating it has learned the training data thoroughly
  • OOB Accuracy: 88.8% - A more reliable estimate of real-world performance; represents the model's expected accuracy on new customer data
  • Perfect Classification: Zero misclassifications in both off-diagonal cells (0 false positives, 0 false negatives on training data)
  • Class Balance: 344 true negatives (68.8%) and 156 true positives (31.2%), reflecting the underlying churn distribution

Interpretation

The 11.2-percentage-point gap between training accuracy (100%) and OOB accuracy (88.8%) is typical and expected in Random Forest models. The training perfection reflects the ensemble's ability to memorize patterns, while the OOB estimate provides a conservative, unbiased assessment of generalization capability. For the churn prediction objective, an 88.8% accuracy means the model correctly

Figure 7

Partial Dependence

Effect of Top Feature on Outcome

Partial dependence plot for top feature: support_tickets

Interpretation

Purpose

This section isolates the effect of support_tickets—the model's single most important predictor (57.58 importance score)—on churn probability. By averaging predictions across all other features, the partial dependence plot reveals the non-linear relationship between support ticket volume and predicted churn, showing how the model responds to this key driver independent of confounding factors.

Key Findings

  • Feature Range: Support tickets span 0–7 across the dataset, with uniform distribution (mean=3.5)
  • Predicted Churn Probability: Ranges from 0.13 (low tickets) to 0.80 (high tickets)—a 6.2× increase
  • Non-linear Relationship: Sharp acceleration occurs between 1.5–5.5 tickets; plateau effect emerges at 6+ tickets
  • Model Sensitivity: Steepest slope in mid-range (1.5–5.5), indicating maximum predictive sensitivity in this zone

Interpretation

The partial dependence curve demonstrates that customers with few support tickets have a baseline 13% predicted churn probability, rising steeply to 80% as tickets increase. This non-linear pattern suggests the model captures a threshold effect: moderate support ticket volume signals escalating churn risk, but the relationship stabilizes at higher volumes. This aligns with the overall

Table 8

Model Configuration

Hyperparameters and Setup

Random Forest model settings and hyperparameters

ParameterValue
Task TypeClassification
Number of Trees300
Features per Split (mtry)2
Total Features8
Training Observations500

Interpretation

Purpose

This section documents the Random Forest model's structural configuration—the hyperparameters and design choices that define how the ensemble was constructed. Understanding these settings is essential for interpreting model behavior, reproducibility, and assessing whether the architecture is appropriate for the churn prediction objective.

Key Findings

  • Number of Trees (n_trees): 300 — A substantial ensemble size that balances computational efficiency with variance reduction; OOB error stabilizes by ~290 trees, confirming adequacy
  • Feature Split Parameter (mtry): 2 — Conservative setting that decorrelates trees and reduces overfitting risk; typical for 8-feature problems
  • Total Predictors (n_features): 8 — All candidate features retained; no pre-filtering applied
  • Task Type: Classification — Binary churn prediction (Yes/No outcomes)

Interpretation

The 300-tree ensemble with mtry=2 creates a robust, well-regularized model suitable for the churn classification task. The low mtry value forces each split to consider only 2 of 8 features randomly, increasing tree diversity and reducing correlation between ensemble members. This configuration directly supports the 88.8% OOB accuracy observed, as the conservative split strategy prevents individual trees from overfitting while maintaining predictive power across the 500 customer records.

Context

Section 9

Model Performance

OOB Performance Metrics

Overall model performance metrics and interpretation

Classification Performance:
OOB Accuracy: 88.8% (Good)
OOB Miss Rate: 11.2%
Training data: 500 observations

Benchmarks: ≥90% = Excellent | 80-90% = Good | 70-80% = Acceptable | <70% = Poor
Note: OOB accuracy is a better indicator of generalization than training accuracy.

Interpretation

Purpose

This section evaluates how well the Random Forest model generalizes to unseen data for the customer churn prediction task. The Out-of-Bag (OOB) accuracy of 88.8% provides an unbiased estimate of real-world performance without requiring a separate test set, making it the most reliable indicator of the model's ability to predict churn in production.

Key Findings

  • OOB Accuracy: 88.8% — Falls in the "Good" range (80–90%), indicating the model correctly classifies churn status in approximately 9 of 10 cases
  • OOB Error Rate: 11.2% — Represents the proportion of misclassified observations; this is the expected error rate on new data
  • Model Stability: 300 trees with 500 observations provides robust ensemble averaging; OOB error stabilizes around 11% after ~100 trees (per error trajectory data)

Interpretation

The model demonstrates solid predictive performance for identifying customer churn. The 88.8% accuracy means the ensemble successfully balances sensitivity and specificity across the two churn classes. This performance level is suitable for operational use, though the 11.2% miss rate indicates approximately 1 in 9 predictions will be incorrect—a consideration for business decisions relying on these predictions.

Context

Table 10

Importance Details

Complete Feature Ranking Table

Detailed feature importance rankings and interpretation

Rank_ValFeatureImportancePct_of_Max
1support_tickets57.58100%
2tenure_months43.4475.4%
3monthly_charges34.7960.4%
4satisfaction23.7741.3%
5login_frequency19.4433.8%
6customer_age18.932.8%
7num_products8.75815.2%
8contract_length6.43311.2%

Interpretation

Purpose

This section identifies which of the 8 predictor variables most strongly influence churn predictions. Feature importance rankings reveal the primary drivers of the model's decision-making process, helping distinguish high-impact variables from those with minimal predictive power. Understanding these rankings is essential for interpreting why the model achieves 88.8% accuracy and which customer behaviors matter most for churn prediction.

Key Findings

  • Top Feature (support_tickets): Dominates with 100% relative importance (57.58 absolute score), indicating customer support interactions are the strongest churn signal
  • Secondary Drivers: Tenure (75.4%), monthly charges (60.4%), and satisfaction (41.3%) form a secondary tier of moderate-to-strong predictors
  • All 8 Features Retained: Every predictor exceeds 10% relative importance threshold, confirming all contribute meaningfully to predictions
  • Importance Gradient: Clear ranking from 57.58 (support_tickets) to 6.43 (contract_length) shows variable predictive power across the feature set

Interpretation

The model identifies support ticket volume as the dominant churn indicator—customers with more support interactions show stronger churn signals. This aligns with the business objective to identify key drivers: tenure, pricing, and satisfaction form a supporting pattern where longer-tenured, satisfied customers with lower charges are

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing