Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| n_trees | 300 | n_trees |
| task_type | auto | task_type |
This analysis applies a Random Forest ensemble classifier to predict customer churn and identify the key drivers influencing churn decisions. The model uses 300 decision trees across 8 customer features to achieve robust classification performance while simultaneously ranking feature importance to guide business strategy.
The Random Forest model successfully identifies support ticket volume as the dominant churn signal, with longer tenure and higher charges providing secondary predictive power. The 88.8% accuracy demonstrates the model captures meaningful patterns in the 500-customer dataset. The convergence of out-of-bag scores demonstrates the ensemble has learned stable decision boundaries, making predictions reliable for unseen customers.
As a black-box ensemble, the model sacrif
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 500 |
| Final Rows | 500 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data cleaning and preparation phase for the Random Forest churn prediction model. Perfect data retention (100%) indicates that no observations were removed during preprocessing, meaning all 500 customer records proceeded to model training. This is critical for understanding whether the model's 88.8% accuracy reflects performance on a complete, unfiltered dataset or if data quality issues were masked by removal decisions.
The 100% retention rate supports the model's reliability for the stated churn prediction objective, as the full customer base was available for training. However, the absence of an explicit train/test split means performance metrics rely entirely on OOB estimates (11.2% error rate). This approach is valid but doesn't demonstrate generalization to truly unseen data. The lack of documented missing value handling or
| Finding | Value |
|---|---|
| Model Type | Random Forest Classification (300 trees) |
| Performance | OOB Accuracy: 88.8% |
| Performance Rating | Good |
| Top Driver | support_tickets |
| Features Used | 8 predictor variables |
| Training Size | 500 observations |
This analysis evaluates a Random Forest classification model built to predict customer churn and identify key drivers. The model's performance and feature importance rankings directly address the business objective of understanding which factors most influence churn behavior, enabling targeted retention strategies.
The model successfully achieves the stated objective: identifying support_tickets as the primary churn driver with 88.8% accuracy. The out-of-bag validation mechanism confirms this performance is not artificially inflated. The clear feature ranking—with support_tickets commanding 57.58 importance points versus 43.44 for tenure—reveals that customer support engagement patterns are substantially more predictive than
Feature importance rankings showing which variables drive predictions most
This section identifies which variables most strongly drive the Random Forest model's churn predictions by measuring their contribution to reducing impurity across all 300 trees. Understanding feature importance reveals the key behavioral and account characteristics that distinguish churners from retained customers, directly supporting the stated objective to "identify key drivers" of churn.
The model identifies support ticket volume as overwhelmingly predictive of churn—customers who file more support tickets are more likely to churn. This aligns with the 88.8% OOB accuracy, suggesting the model reliably captures churn patterns. The steep importance gradient indicates that a small subset of features (top 3) account for most predictive power
OOB convergence — shows how model performance stabilizes as trees are added
This section demonstrates how the Random Forest model's out-of-bag error rate stabilizes as additional trees are added to the ensemble. OOB convergence is critical for validating that the model has grown enough trees to achieve reliable, stable predictions without overfitting—directly supporting the churn prediction objective.
The model demonstrates strong convergence behavior, with the OOB error rate dropping from 24% to 11.2% as trees accumulate. The flattening curve after approximately 100 trees indicates that additional trees provide minimal performance gains, suggesting the ensemble has captured the underlying patterns in customer churn drivers. This stability validates the 300-tree configuration as sufficient for reliable out-of-sample predictions.
OOB error serves as an unbiased performance estimate without requiring a separate test set. The low final miss rate (11.2%) aligns with the model's overall
Confusion matrix: actual vs predicted classifications
This confusion matrix displays the Random Forest model's classification performance on customer churn prediction, comparing actual churn outcomes against predicted classifications. It reveals both training-set performance and the more realistic out-of-bag (OOB) generalization accuracy, which indicates how well the model will perform on unseen data in production.
The 11.2-percentage-point gap between training accuracy (100%) and OOB accuracy (88.8%) is typical and expected in Random Forest models. The training perfection reflects the ensemble's ability to memorize patterns, while the OOB estimate provides a conservative, unbiased assessment of generalization capability. For the churn prediction objective, an 88.8% accuracy means the model correctly
Partial dependence plot for top feature: support_tickets
This section isolates the effect of support_tickets—the model's single most important predictor (57.58 importance score)—on churn probability. By averaging predictions across all other features, the partial dependence plot reveals the non-linear relationship between support ticket volume and predicted churn, showing how the model responds to this key driver independent of confounding factors.
The partial dependence curve demonstrates that customers with few support tickets have a baseline 13% predicted churn probability, rising steeply to 80% as tickets increase. This non-linear pattern suggests the model captures a threshold effect: moderate support ticket volume signals escalating churn risk, but the relationship stabilizes at higher volumes. This aligns with the overall
Random Forest model settings and hyperparameters
| Parameter | Value |
|---|---|
| Task Type | Classification |
| Number of Trees | 300 |
| Features per Split (mtry) | 2 |
| Total Features | 8 |
| Training Observations | 500 |
This section documents the Random Forest model's structural configuration—the hyperparameters and design choices that define how the ensemble was constructed. Understanding these settings is essential for interpreting model behavior, reproducibility, and assessing whether the architecture is appropriate for the churn prediction objective.
The 300-tree ensemble with mtry=2 creates a robust, well-regularized model suitable for the churn classification task. The low mtry value forces each split to consider only 2 of 8 features randomly, increasing tree diversity and reducing correlation between ensemble members. This configuration directly supports the 88.8% OOB accuracy observed, as the conservative split strategy prevents individual trees from overfitting while maintaining predictive power across the 500 customer records.
Overall model performance metrics and interpretation
This section evaluates how well the Random Forest model generalizes to unseen data for the customer churn prediction task. The Out-of-Bag (OOB) accuracy of 88.8% provides an unbiased estimate of real-world performance without requiring a separate test set, making it the most reliable indicator of the model's ability to predict churn in production.
The model demonstrates solid predictive performance for identifying customer churn. The 88.8% accuracy means the ensemble successfully balances sensitivity and specificity across the two churn classes. This performance level is suitable for operational use, though the 11.2% miss rate indicates approximately 1 in 9 predictions will be incorrect—a consideration for business decisions relying on these predictions.
Detailed feature importance rankings and interpretation
| Rank_Val | Feature | Importance | Pct_of_Max |
|---|---|---|---|
| 1 | support_tickets | 57.58 | 100% |
| 2 | tenure_months | 43.44 | 75.4% |
| 3 | monthly_charges | 34.79 | 60.4% |
| 4 | satisfaction | 23.77 | 41.3% |
| 5 | login_frequency | 19.44 | 33.8% |
| 6 | customer_age | 18.9 | 32.8% |
| 7 | num_products | 8.758 | 15.2% |
| 8 | contract_length | 6.433 | 11.2% |
This section identifies which of the 8 predictor variables most strongly influence churn predictions. Feature importance rankings reveal the primary drivers of the model's decision-making process, helping distinguish high-impact variables from those with minimal predictive power. Understanding these rankings is essential for interpreting why the model achieves 88.8% accuracy and which customer behaviors matter most for churn prediction.
The model identifies support ticket volume as the dominant churn indicator—customers with more support interactions show stronger churn signals. This aligns with the business objective to identify key drivers: tenure, pricing, and satisfaction form a supporting pattern where longer-tenured, satisfied customers with lower charges are