Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| prediction_horizon_days | 365 | prediction_horizon_days |
| discount_rate | 0.15 | discount_rate |
| holdout_days | 90 | holdout_days |
| min_transactions | 1 | min_transactions |
This analysis applies the BG/NBD (Beta-Geometric/Negative Binomial Distribution) and Gamma-Gamma models to predict customer lifetime value (CLV) for Test Company. The framework combines purchase frequency/recency patterns with spending behavior to segment customers and forecast future value, enabling data-driven retention and resource allocation strategies.
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 80 |
| Final Rows | 41 |
| Rows Removed | 39 |
| Retention Rate | 51.2% |
This section documents the data cleaning and filtering process applied before modeling. The 51.2% retention rate indicates substantial data reduction, which is critical to understand when evaluating model reliability and the representativeness of downstream CLV, P(Alive), and segmentation analyses.
The substantial data loss raises concerns about sample bias and model generalizability. With only 41 customers retained from 80 initial observations, the BG/NBD and Gamma-Gamma models trained on this subset may not reflect the broader customer population. The lack of documented filtering criteria makes it impossible to determine whether removed records were outliers, incomplete cases, or systematically different customer segments. This directly impacts confidence in the CLV predictions and P(Alive) estimates presented in the analysis.
The validation data shows extremely high
| Finding | Value |
|---|---|
| Active customers (P(Alive) ≥ 0.5) | 85.4% |
| Avg predicted CLV (next 365d) | $100098.53 |
| Total predicted future revenue | $1,201,182 |
| Top 20% revenue concentration | 25.1% |
| Repeat purchase rate | 83.3% |
| Model fitted | BG/NBD + Gamma-Gamma |
This executive summary synthesizes a customer lifetime value (CLV) analysis using the BG/NBD probabilistic model to assess the predictive revenue potential of a 12-customer cohort over the next 365 days. The analysis directly addresses the business objective of quantifying customer value and identifying retention priorities.
The model demonstrates a healthy, stable customer base with strong predictive signals. The high repeat rate and elevated P(Alive) probability suggest the cohort exhibits low churn risk. However, the validation data reveals substantial prediction errors (4,605–49,096% error rates), indicating the model's transaction-level forecasts are unreliable despite producing reasonable aggregate
Key customer lifetime value metrics from the BG/NBD model
This section quantifies the predicted financial value of the customer base over the next 365 days using the BG/NBD probabilistic model. It serves as the core output of the CLV analysis, enabling assessment of total revenue potential, customer health, and value concentration across the portfolio.
The customer base exhibits healthy fundamentals with high repeat engagement and strong predicted revenue. The alignment between mean and median CLV ($100,098.5) suggests a stable, homogeneous customer value profile without significant outliers. The 85.4% alive probability reflects recent transaction activity and engagement patterns captured in the BG/NBD model, while the 83.3% repeat rate validates the model's ability to
Expected future transactions by customer frequency and recency
This frequency-recency matrix quantifies expected future transaction volume across customer segments defined by their purchase history and engagement recency. It serves as a predictive lens for identifying which customers are most likely to remain active, enabling data-driven segmentation for retention and engagement strategies within the broader CLV and customer lifecycle analysis.
The near-uniform expected transaction predictions (mean=328.1) across diverse frequency-recency combinations is counterintuitive and suggests the BG/NBD model may be producing undifferentiated forecasts. Typically, high-frequency recent purchasers should show substantially higher expected transactions than dormant or infrequent buyers. The concentration of data in the 0–30 day recency band reflects a customer base dominated
Probability each customer segment is still active
This section quantifies the probability that each customer remains active and engaged in their purchase cycle using the BG/NBD probabilistic model. P(Alive) is essential for distinguishing genuinely churned customers from those temporarily inactive, enabling targeted retention strategies and accurate lifetime value predictions.
The heatmap reveals that rec
Distribution of predicted customer lifetime values
This section quantifies how predicted customer lifetime value distributes across the customer base, revealing concentration patterns in revenue potential. Understanding CLV distribution is essential for resource allocation, as it identifies whether value is concentrated among a few high-value customers or spread evenly across the base.
The near-identical mean and median CLV contradicts typical e-commerce patterns where high-value customers drive disproportionate revenue. This dataset exhibits unusual homogeneity in predicted lifetime value, suggesting either a mature, stable customer base with consistent purchasing behavior
Highest-value customers ranked by predicted CLV
This section identifies the 12 highest-value customers by predicted lifetime value (CLV) over the next 365 days, ranked to guide resource allocation for retention and engagement strategies. The visualization combines CLV predictions with P(Alive) probability—a measure of churn risk—to highlight which customers warrant priority investment in loyalty programs, early access campaigns, and dedicated support.
The BG/NBD model predicts similar CLV across all top customers despite divergent behavioral profiles
Customer segmentation by CLV tier and activity status
| segment_name | customer_count | avg_clv | total_clv | avg_palive | pct_customers |
|---|---|---|---|---|---|
| At Risk | 1 | 1.003e+05 | 1.003e+05 | 0.372 | 8.3 |
| Champions | 2 | 1.005e+05 | 2.01e+05 | 0.999 | 16.7 |
| Lost | 1 | 9.985e+04 | 9.985e+04 | 0.089 | 8.3 |
| Loyal | 6 | 1.001e+05 | 6.005e+05 | 0.966 | 50 |
| Potential | 2 | 9.973e+04 | 1.995e+05 | 0.996 | 16.7 |
This section segments the customer base into five behavioral groups based on predicted Customer Lifetime Value (CLV) and probability of being alive (P(Alive)), enabling targeted retention and growth strategies. Understanding segment composition reveals where value is concentrated and which customers face churn risk, directly supporting the BG/NBD and Gamma-Gamma modeling objectives.
The segmentation reveals a healthy portfolio skewed toward active
Model calibration vs holdout validation
| frequency_group | actual_transactions | predicted_transactions | n_customers | error_pct |
|---|---|---|---|---|
| 0 (one-time) | 1 | 328 | 2 | 3.27e+04 |
| 1 repeat | 0.667 | 328 | 3 | 4.91e+04 |
| 2 repeats | 3 | 328.2 | 2 | 1.084e+04 |
| 3-4 repeats | 4.333 | 327.1 | 3 | 7448 |
| 5+ repeats | 7 | 329.4 | 2 | 4605 |
This section validates the BG/NBD model's predictive accuracy by comparing forecasted transactions against actual holdout-period observations across customer frequency groups. Strong alignment between predicted and actual values confirms the model reliably captures customer purchase behavior, which is essential for accurate CLV estimation and segmentation.
The model demonstrates strong predictive power for repeat customers but struggles with sparse-data segments. High-frequency customers provide sufficient transaction history for accurate parameter estimation, whereas one-time buyers offer minimal signal, making individual-level predictions inherently unreliable. The extreme percentage errors for low-frequency groups reflect small absolute differences magnified by low baseline actuals, not fundamental model failure.
Validation results align with CLV distribution and segment summary findings, where Loyal and Champions segments (higher frequency) show stable CLV estimates. The small sample sizes per group suggest
BG/NBD + Gamma-Gamma model parameter estimates
| model | parameter_name | estimate | interpretation |
|---|---|---|---|
| BG/NBD | r (purchase rate shape) | 1.009 | Shape of purchase rate Gamma distribution |
| BG/NBD | alpha (purchase rate scale) | 2.759 | Scale of purchase rate Gamma distribution |
| BG/NBD | a (dropout shape) | 0.0025 | First shape parameter of dropout Beta distribution |
| BG/NBD | b (dropout shape) | 0.0121 | Second shape parameter of dropout Beta distribution |
| Gamma-Gamma | p (spend shape) | 4.733 | Individual spend variability shape |
| Gamma-Gamma | q (pop spend shape) | 1203 | Population spend heterogeneity shape |
| Gamma-Gamma | v (spend scale) | Scale of spending distribution |
This section presents the estimated parameters of a BG/NBD + Gamma-Gamma probabilistic model, which quantifies customer purchase behavior and spending patterns across the entire customer base. These parameters form the foundation for predicting customer lifetime value (CLV), probability of being alive (P(alive)), and expected transaction counts—all critical metrics visible in the heatmaps and customer segments throughout the analysis.
The model captures two distinct customer dimensions: purchase frequency (governed by BG/NBD) and monetary value (governed by Gamma-Gamma).