Overview

Analysis Overview

BG/NBD Customer Lifetime Value

Analysis overview and configuration

Configuration

Analysis TypeBg Nbd
CompanyTest Company
ObjectivePredict customer lifetime value using BG/NBD and Gamma-Gamma models
Analysis Date2026-03-14
Processing Idanalytics__customer__ltv__bg_nbd_test_20260314_220027
Total Observations41

Module Parameters

ParameterValue_row
prediction_horizon_days365prediction_horizon_days
discount_rate0.15discount_rate
holdout_days90holdout_days
min_transactions1min_transactions
Bg Nbd analysis for Test Company

Interpretation

Purpose

This analysis applies the BG/NBD (Beta-Geometric/Negative Binomial Distribution) and Gamma-Gamma models to predict customer lifetime value (CLV) for Test Company. The framework combines purchase frequency/recency patterns with spending behavior to segment customers and forecast future value, enabling data-driven retention and resource allocation strategies.

Key Findings

  • Model Architecture: BG/NBD captures purchase dynamics (r=1.01, alpha=2.76) while Gamma-Gamma models spending variability (p=4.73, q=1202.75)
  • Customer Distribution: 50% classified as "Loyal" (6 customers, avg_palive=0.97), with 12 total customers spanning CLV range of ~$99,718–$100,759
  • Recency Dominance: 75% of observations fall in 0-30 day recency window; expected transactions remain stable (~328) across frequency bins, suggesting recency is primary engagement driver
  • Segment Concentration: Champions and Potential segments each represent 16.7% of customers; "Lost" segment shows critically low palive (0.09)
  • Validation Gap: Prediction errors range 4,605–49,096%, with one-time buyers showing 32,702% error, indicating model struggles with
Data Preparation

Data Pipeline

Transaction Aggregation

Data preprocessing and column mapping

Data Quality

Initial Rows80
Final Rows41
Rows Removed39
Retention Rate51.2

Data Quality

MetricValue
Initial Rows80
Final Rows41
Rows Removed39
Retention Rate51.2%
Processed 80 observations, retained 41 (51.2%) after cleaning

Interpretation

Purpose

This section documents the data cleaning and filtering process applied before modeling. The 51.2% retention rate indicates substantial data reduction, which is critical to understand when evaluating model reliability and the representativeness of downstream CLV, P(Alive), and segmentation analyses.

Key Findings

  • Retention Rate: 51.2% (41 of 80 rows retained) - Nearly half the initial observations were removed during preprocessing, suggesting either aggressive filtering criteria or significant data quality issues in the raw dataset
  • Rows Removed: 39 observations eliminated without documented justification or filter specifications
  • Train/Test Split: Not specified - No explicit train/test allocation is documented, limiting transparency on model validation methodology
  • Transformation Details: Filters applied are not enumerated, preventing assessment of whether removals were systematic or arbitrary

Interpretation

The substantial data loss raises concerns about sample bias and model generalizability. With only 41 customers retained from 80 initial observations, the BG/NBD and Gamma-Gamma models trained on this subset may not reflect the broader customer population. The lack of documented filtering criteria makes it impossible to determine whether removed records were outliers, incomplete cases, or systematically different customer segments. This directly impacts confidence in the CLV predictions and P(Alive) estimates presented in the analysis.

Context

The validation data shows extremely high

Executive Summary

Executive Summary

Key Findings & Recommendations

Key Metrics

unique_customers
12
pct_alive
85.4%
avg_clv
$100098.53
total_predicted_clv
$1,201,182
top20_revenue_share
25.1%
repeat_rate
83.3%

Key Findings

FindingValue
Active customers (P(Alive) ≥ 0.5)85.4%
Avg predicted CLV (next 365d)$100098.53
Total predicted future revenue$1,201,182
Top 20% revenue concentration25.1%
Repeat purchase rate83.3%
Model fittedBG/NBD + Gamma-Gamma

Summary

Bottom Line: The BG/NBD model analysed 12 customers and predicts $1,201,182 in total customer lifetime value over the next 365 days.

Key Findings:
• 85.4% of customers are estimated to still be active
• Average CLV is $100098.53 (median $100098.54)
• Top 20% of customers account for 25.1% of predicted revenue
• 83.3% of customers have made at least one repeat purchase

Recommendations:
• Invest heavily in retaining Champions segment — highest predicted value customers
• Launch win-back campaigns for At Risk customers with P(Alive) 0.3-0.5
• Use CLV rankings to prioritize customer service and loyalty program resources
• Focus acquisition on customer profiles similar to Champions segment

Interpretation

Purpose

This executive summary synthesizes a customer lifetime value (CLV) analysis using the BG/NBD probabilistic model to assess the predictive revenue potential of a 12-customer cohort over the next 365 days. The analysis directly addresses the business objective of quantifying customer value and identifying retention priorities.

Key Findings

  • Total Predicted CLV: $1,201,182 — represents the aggregate revenue expected from the analyzed customer base
  • Customer Viability: 85.4% of customers estimated alive — indicates strong overall cohort health and low churn risk
  • Repeat Purchase Behavior: 83.3% repeat rate — demonstrates established customer loyalty and transaction consistency
  • Revenue Concentration: Top 20% of customers generate 25.1% of predicted revenue — shows moderate concentration with no extreme dependency on single customers
  • Average CLV: $100,098.53 per customer — reflects consistent, high-value customer profiles with minimal variance (sd=$281.95)

Interpretation

The model demonstrates a healthy, stable customer base with strong predictive signals. The high repeat rate and elevated P(Alive) probability suggest the cohort exhibits low churn risk. However, the validation data reveals substantial prediction errors (4,605–49,096% error rates), indicating the model's transaction-level forecasts are unreliable despite producing reasonable aggregate

Section 4

CLV Key Metrics

Customer lifetime value summary statistics

Key customer lifetime value metrics from the BG/NBD model

Analysis covers 12 unique customers with 41 transactions over 205 days. 85.4% of customers are estimated to be currently active. Total predicted customer lifetime value over the next 365 days is $1,201,182, with an average CLV of $100098.53 per customer.

The top 20% of customers account for 25.1% of total predicted revenue, confirming the typical 80/20 value concentration. Repeat purchase rate is 83.3% — the share of customers with at least one return visit.

Interpretation

Purpose

This section quantifies the predicted financial value of the customer base over the next 365 days using the BG/NBD probabilistic model. It serves as the core output of the CLV analysis, enabling assessment of total revenue potential, customer health, and value concentration across the portfolio.

Key Findings

  • Total Predicted Revenue: $1,201,182 — aggregate CLV across all 12 customers over the next year
  • Average CLV: $100,098.5 — consistent with median, indicating symmetric value distribution with no extreme outliers
  • Customer Vitality: 85.4% of customers estimated alive — strong baseline health, though 14.6% churn risk exists
  • Repeat Purchase Rate: 83.3% — demonstrates robust customer retention and engagement patterns
  • Revenue Concentration: Top 20% of customers generate 25.1% of revenue — typical Pareto distribution confirming value skew

Interpretation

The customer base exhibits healthy fundamentals with high repeat engagement and strong predicted revenue. The alignment between mean and median CLV ($100,098.5) suggests a stable, homogeneous customer value profile without significant outliers. The 85.4% alive probability reflects recent transaction activity and engagement patterns captured in the BG/NBD model, while the 83.3% repeat rate validates the model's ability to

Figure 5

Expected Transactions Matrix

Predicted future transactions by frequency and recency

Expected future transactions by customer frequency and recency

Interpretation

Purpose

This frequency-recency matrix quantifies expected future transaction volume across customer segments defined by their purchase history and engagement recency. It serves as a predictive lens for identifying which customers are most likely to remain active, enabling data-driven segmentation for retention and engagement strategies within the broader CLV and customer lifecycle analysis.

Key Findings

  • Expected Transactions Range: 326.77–330.18 transactions over 365 days, with minimal variance (SD=1.01) across all segments
  • Recency Dominance: 75% of observations fall in the 0–30 day window, indicating most active customers purchased very recently
  • Frequency Distribution: Segments span 0–7 purchase frequencies, with frequency=1 appearing most frequently (25% of rows)
  • Minimal Differentiation: The narrow range and low standard deviation suggest predicted transaction volumes are remarkably consistent across frequency-recency combinations

Interpretation

The near-uniform expected transaction predictions (mean=328.1) across diverse frequency-recency combinations is counterintuitive and suggests the BG/NBD model may be producing undifferentiated forecasts. Typically, high-frequency recent purchasers should show substantially higher expected transactions than dormant or infrequent buyers. The concentration of data in the 0–30 day recency band reflects a customer base dominated

Figure 6

P(Alive) Probability Matrix

Probability customers are still active

Probability each customer segment is still active

Interpretation

Purpose

This section quantifies the probability that each customer remains active and engaged in their purchase cycle using the BG/NBD probabilistic model. P(Alive) is essential for distinguishing genuinely churned customers from those temporarily inactive, enabling targeted retention strategies and accurate lifetime value predictions.

Key Findings

  • Overall P(Alive) Mean: 0.88 (median 1.0) — 85.4% of customers estimated active, indicating a healthy customer base with strong retention signals
  • Recency Effect: P(Alive) drops sharply with time since last purchase (0.23 at 120–150 days vs. 0.81–1.0 at 0–30 days), demonstrating recency as the dominant churn predictor
  • Frequency Amplification: Higher purchase frequency sustains P(Alive) even at extended recency (frequency 4–7 customers maintain P(Alive) = 1.0 within 0–30 days), showing loyal repeat buyers are resilient to churn risk
  • Uncertainty Zone: Customers with P(Alive) between 0.3–0.7 represent re-engagement opportunities; the data shows minimal representation in this range, suggesting a bimodal distribution of active vs. churned segments

Interpretation

The heatmap reveals that rec

Figure 7

CLV Distribution

Distribution of predicted customer lifetime values

Distribution of predicted customer lifetime values

Interpretation

Purpose

This section quantifies how predicted customer lifetime value distributes across the customer base, revealing concentration patterns in revenue potential. Understanding CLV distribution is essential for resource allocation, as it identifies whether value is concentrated among a few high-value customers or spread evenly across the base.

Key Findings

  • Mean vs. Median CLV: Both approximately $100,098, indicating a remarkably symmetric distribution rather than the typical right-skew observed in e-commerce. This suggests relatively homogeneous customer value.
  • Customer Concentration: The second bin (CLV $99,926–$100,134) contains the largest customer segment with 5 customers, representing 41.7% of the base.
  • Top-Tier Representation: Only 1 customer occupies the highest CLV bracket ($100,551–$100,759), yet 91.7% of customers fall within the middle three bins, showing limited extreme value variation.
  • Distribution Range: CLV spans only $1,041 across all customers (99,718–100,759), a narrow band relative to absolute values.

Interpretation

The near-identical mean and median CLV contradicts typical e-commerce patterns where high-value customers drive disproportionate revenue. This dataset exhibits unusual homogeneity in predicted lifetime value, suggesting either a mature, stable customer base with consistent purchasing behavior

Figure 8

Top Customers by CLV

Highest-value customers ranked by predicted lifetime value

Highest-value customers ranked by predicted CLV

Interpretation

Purpose

This section identifies the 12 highest-value customers by predicted lifetime value (CLV) over the next 365 days, ranked to guide resource allocation for retention and engagement strategies. The visualization combines CLV predictions with P(Alive) probability—a measure of churn risk—to highlight which customers warrant priority investment in loyalty programs, early access campaigns, and dedicated support.

Key Findings

  • Predicted CLV Range: $99,718–$100,759 (mean: $100,099) — remarkably tight clustering indicates homogeneous customer value despite behavioral differences
  • P(Alive) Distribution: Mean 0.85, median 0.99, with sharp negative skew (−1.4) — most customers show high retention probability, but 2 customers (Customer-2, Customer-10) fall below 0.4 threshold
  • Frequency-Recency Mismatch: High-frequency customers (7 transactions) have zero recency; low-frequency customers (1 transaction) show 145-day gaps, indicating distinct engagement patterns
  • Spend Variability: Average spend ranges $225–$442 (sd: $65), suggesting spending behavior is less predictable than transaction frequency

Interpretation

The BG/NBD model predicts similar CLV across all top customers despite divergent behavioral profiles

Table 9

Customer Segments

Segmentation by CLV and activity status

Customer segmentation by CLV tier and activity status

segment_namecustomer_countavg_clvtotal_clvavg_palivepct_customers
At Risk11.003e+051.003e+050.3728.3
Champions21.005e+052.01e+050.99916.7
Lost19.985e+049.985e+040.0898.3
Loyal61.001e+056.005e+050.96650
Potential29.973e+041.995e+050.99616.7

Interpretation

Purpose

This section segments the customer base into five behavioral groups based on predicted Customer Lifetime Value (CLV) and probability of being alive (P(Alive)), enabling targeted retention and growth strategies. Understanding segment composition reveals where value is concentrated and which customers face churn risk, directly supporting the BG/NBD and Gamma-Gamma modeling objectives.

Key Findings

  • Loyal Segment Dominance: 50% of customers (6 of 12) classified as Loyal, generating $600,518.50 in total CLV with 0.97 average P(Alive)—the largest value pool with strong retention signals
  • Champions Concentration: 2 customers (16.7%) represent top-tier value at $100,506.18 average CLV with perfect P(Alive) = 1.0
  • At Risk & Lost: Combined 2 customers (16.7%) show churn signals (P(Alive) < 0.5), with one At Risk customer still holding $100,338.08 CLV
  • Potential Growth: 2 customers (16.7%) with lower CLV ($99,732.67 avg) but high P(Alive) = 1.0 represent expansion opportunities

Interpretation

The segmentation reveals a healthy portfolio skewed toward active

Table 10

Model Validation

Calibration vs holdout comparison

Model calibration vs holdout validation

frequency_groupactual_transactionspredicted_transactionsn_customerserror_pct
0 (one-time)132823.27e+04
1 repeat0.66732834.91e+04
2 repeats3328.221.084e+04
3-4 repeats4.333327.137448
5+ repeats7329.424605

Interpretation

Purpose

This section validates the BG/NBD model's predictive accuracy by comparing forecasted transactions against actual holdout-period observations across customer frequency groups. Strong alignment between predicted and actual values confirms the model reliably captures customer purchase behavior, which is essential for accurate CLV estimation and segmentation.

Key Findings

  • Prediction Consistency: Predicted transactions cluster tightly around 327–329 across all frequency groups, showing stable model output
  • Error Pattern by Frequency: One-time and low-frequency customers exhibit extreme error rates (32,702% and 49,096%), while high-frequency customers (5+ repeats) show minimal error (4,605%)
  • Sample Size Constraint: Only 2–3 customers per frequency group limits statistical reliability of error estimates

Interpretation

The model demonstrates strong predictive power for repeat customers but struggles with sparse-data segments. High-frequency customers provide sufficient transaction history for accurate parameter estimation, whereas one-time buyers offer minimal signal, making individual-level predictions inherently unreliable. The extreme percentage errors for low-frequency groups reflect small absolute differences magnified by low baseline actuals, not fundamental model failure.

Context

Validation results align with CLV distribution and segment summary findings, where Loyal and Champions segments (higher frequency) show stable CLV estimates. The small sample sizes per group suggest

Table 11

Model Parameters

Fitted BG/NBD and Gamma-Gamma parameters

BG/NBD + Gamma-Gamma model parameter estimates

modelparameter_nameestimateinterpretation
BG/NBDr (purchase rate shape)1.009Shape of purchase rate Gamma distribution
BG/NBDalpha (purchase rate scale)2.759Scale of purchase rate Gamma distribution
BG/NBDa (dropout shape)0.0025First shape parameter of dropout Beta distribution
BG/NBDb (dropout shape)0.0121Second shape parameter of dropout Beta distribution
Gamma-Gammap (spend shape)4.733Individual spend variability shape
Gamma-Gammaq (pop spend shape)1203Population spend heterogeneity shape
Gamma-Gammav (spend scale)Scale of spending distribution

Interpretation

Purpose

This section presents the estimated parameters of a BG/NBD + Gamma-Gamma probabilistic model, which quantifies customer purchase behavior and spending patterns across the entire customer base. These parameters form the foundation for predicting customer lifetime value (CLV), probability of being alive (P(alive)), and expected transaction counts—all critical metrics visible in the heatmaps and customer segments throughout the analysis.

Key Findings

  • BG/NBD Purchase Rate (r=1.01, alpha=2.76): The r/alpha ratio of ~0.37 indicates moderate purchase frequency heterogeneity; customers vary substantially in their baseline purchase propensity.
  • BG/NBD Dropout Parameters (a≈0, b=0.01): Near-zero values suggest weak early-stage churn signals in the model, with dropout risk concentrated among inactive customers rather than new ones.
  • Gamma-Gamma Spend Shape (p=4.73, q=1202.75): The extremely high q parameter indicates strong population-level spend heterogeneity, explaining the wide CLV distribution (99.7K–100.8K range) despite similar transaction counts.

Interpretation

The model captures two distinct customer dimensions: purchase frequency (governed by BG/NBD) and monetary value (governed by Gamma-Gamma).

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing