Analysis overview and configuration

Configuration

Analysis TypeBg Nbd

CompanyTest Company

ObjectivePredict customer lifetime value using BG/NBD and Gamma-Gamma models

Analysis Date2026-03-14

Processing Idanalytics__customer__ltv__bg_nbd_test_20260314_220027

Total Observations41

Module Parameters

Parameter	Value	_row
prediction_horizon_days	365	prediction_horizon_days
discount_rate	0.15	discount_rate
holdout_days	90	holdout_days
min_transactions	1	min_transactions

Bg Nbd analysis for Test Company

Interpretation

Purpose

This analysis applies the BG/NBD (Beta-Geometric/Negative Binomial Distribution) and Gamma-Gamma models to predict customer lifetime value (CLV) for Test Company. The framework combines purchase frequency/recency patterns with spending behavior to segment customers and forecast future value, enabling data-driven retention and resource allocation strategies.

Key Findings

Model Architecture: BG/NBD captures purchase dynamics (r=1.01, alpha=2.76) while Gamma-Gamma models spending variability (p=4.73, q=1202.75)
Customer Distribution: 50% classified as "Loyal" (6 customers, avg_palive=0.97), with 12 total customers spanning CLV range of ~$99,718–$100,759
Recency Dominance: 75% of observations fall in 0-30 day recency window; expected transactions remain stable (~328) across frequency bins, suggesting recency is primary engagement driver
Segment Concentration: Champions and Potential segments each represent 16.7% of customers; "Lost" segment shows critically low palive (0.09)
Validation Gap: Prediction errors range 4,605–49,096%, with one-time buyers showing 32,702% error, indicating model struggles with

Data preprocessing and column mapping

Data Quality

Initial Rows80

Final Rows41

Rows Removed39

Retention Rate51.2

Data Quality

Metric	Value
Initial Rows	80
Final Rows	41
Rows Removed	39
Retention Rate	51.2%

Processed 80 observations, retained 41 (51.2%) after cleaning

Interpretation

Purpose

This section documents the data cleaning and filtering process applied before modeling. The 51.2% retention rate indicates substantial data reduction, which is critical to understand when evaluating model reliability and the representativeness of downstream CLV, P(Alive), and segmentation analyses.

Key Findings

Retention Rate: 51.2% (41 of 80 rows retained) - Nearly half the initial observations were removed during preprocessing, suggesting either aggressive filtering criteria or significant data quality issues in the raw dataset
Rows Removed: 39 observations eliminated without documented justification or filter specifications
Train/Test Split: Not specified - No explicit train/test allocation is documented, limiting transparency on model validation methodology
Transformation Details: Filters applied are not enumerated, preventing assessment of whether removals were systematic or arbitrary

Interpretation

The substantial data loss raises concerns about sample bias and model generalizability. With only 41 customers retained from 80 initial observations, the BG/NBD and Gamma-Gamma models trained on this subset may not reflect the broader customer population. The lack of documented filtering criteria makes it impossible to determine whether removed records were outliers, incomplete cases, or systematically different customer segments. This directly impacts confidence in the CLV predictions and P(Alive) estimates presented in the analysis.

Context

The validation data shows extremely high

Key Metrics

unique_customers: 12
pct_alive: 85.4%
avg_clv: $100098.53
total_predicted_clv: $1,201,182
top20_revenue_share: 25.1%
repeat_rate: 83.3%

Key Findings

Finding	Value
Active customers (P(Alive) ≥ 0.5)	85.4%
Avg predicted CLV (next 365d)	$100098.53
Total predicted future revenue	$1,201,182
Top 20% revenue concentration	25.1%
Repeat purchase rate	83.3%
Model fitted	BG/NBD + Gamma-Gamma

Summary

Bottom Line: The BG/NBD model analysed 12 customers and predicts $1,201,182 in total customer lifetime value over the next 365 days.

Key Findings:
• 85.4% of customers are estimated to still be active
• Average CLV is $100098.53 (median $100098.54)
• Top 20% of customers account for 25.1% of predicted revenue
• 83.3% of customers have made at least one repeat purchase

Recommendations:
• Invest heavily in retaining Champions segment — highest predicted value customers
• Launch win-back campaigns for At Risk customers with P(Alive) 0.3-0.5
• Use CLV rankings to prioritize customer service and loyalty program resources
• Focus acquisition on customer profiles similar to Champions segment

Interpretation

Purpose

This executive summary synthesizes a customer lifetime value (CLV) analysis using the BG/NBD probabilistic model to assess the predictive revenue potential of a 12-customer cohort over the next 365 days. The analysis directly addresses the business objective of quantifying customer value and identifying retention priorities.

Key Findings

Total Predicted CLV: $1,201,182 — represents the aggregate revenue expected from the analyzed customer base
Customer Viability: 85.4% of customers estimated alive — indicates strong overall cohort health and low churn risk
Repeat Purchase Behavior: 83.3% repeat rate — demonstrates established customer loyalty and transaction consistency
Revenue Concentration: Top 20% of customers generate 25.1% of predicted revenue — shows moderate concentration with no extreme dependency on single customers
Average CLV: $100,098.53 per customer — reflects consistent, high-value customer profiles with minimal variance (sd=$281.95)

Interpretation

The model demonstrates a healthy, stable customer base with strong predictive signals. The high repeat rate and elevated P(Alive) probability suggest the cohort exhibits low churn risk. However, the validation data reveals substantial prediction errors (4,605–49,096% error rates), indicating the model's transaction-level forecasts are unreliable despite producing reasonable aggregate

Key customer lifetime value metrics from the BG/NBD model

Analysis covers 12 unique customers with 41 transactions over 205 days. 85.4% of customers are estimated to be currently active. Total predicted customer lifetime value over the next 365 days is $1,201,182, with an average CLV of $100098.53 per customer.

The top 20% of customers account for 25.1% of total predicted revenue, confirming the typical 80/20 value concentration. Repeat purchase rate is 83.3% — the share of customers with at least one return visit.

Interpretation

Purpose

This section quantifies the predicted financial value of the customer base over the next 365 days using the BG/NBD probabilistic model. It serves as the core output of the CLV analysis, enabling assessment of total revenue potential, customer health, and value concentration across the portfolio.

Key Findings

Total Predicted Revenue: $1,201,182 — aggregate CLV across all 12 customers over the next year
Average CLV: $100,098.5 — consistent with median, indicating symmetric value distribution with no extreme outliers
Customer Vitality: 85.4% of customers estimated alive — strong baseline health, though 14.6% churn risk exists
Repeat Purchase Rate: 83.3% — demonstrates robust customer retention and engagement patterns
Revenue Concentration: Top 20% of customers generate 25.1% of revenue — typical Pareto distribution confirming value skew

Interpretation

The customer base exhibits healthy fundamentals with high repeat engagement and strong predicted revenue. The alignment between mean and median CLV ($100,098.5) suggests a stable, homogeneous customer value profile without significant outliers. The 85.4% alive probability reflects recent transaction activity and engagement patterns captured in the BG/NBD model, while the 83.3% repeat rate validates the model's ability to

Expected future transactions by customer frequency and recency

Interpretation

Purpose

This frequency-recency matrix quantifies expected future transaction volume across customer segments defined by their purchase history and engagement recency. It serves as a predictive lens for identifying which customers are most likely to remain active, enabling data-driven segmentation for retention and engagement strategies within the broader CLV and customer lifecycle analysis.

Key Findings

Expected Transactions Range: 326.77–330.18 transactions over 365 days, with minimal variance (SD=1.01) across all segments
Recency Dominance: 75% of observations fall in the 0–30 day window, indicating most active customers purchased very recently
Frequency Distribution: Segments span 0–7 purchase frequencies, with frequency=1 appearing most frequently (25% of rows)
Minimal Differentiation: The narrow range and low standard deviation suggest predicted transaction volumes are remarkably consistent across frequency-recency combinations

Interpretation

The near-uniform expected transaction predictions (mean=328.1) across diverse frequency-recency combinations is counterintuitive and suggests the BG/NBD model may be producing undifferentiated forecasts. Typically, high-frequency recent purchasers should show substantially higher expected transactions than dormant or infrequent buyers. The concentration of data in the 0–30 day recency band reflects a customer base dominated

Probability each customer segment is still active

Interpretation

Purpose

This section quantifies the probability that each customer remains active and engaged in their purchase cycle using the BG/NBD probabilistic model. P(Alive) is essential for distinguishing genuinely churned customers from those temporarily inactive, enabling targeted retention strategies and accurate lifetime value predictions.

Key Findings

Overall P(Alive) Mean: 0.88 (median 1.0) — 85.4% of customers estimated active, indicating a healthy customer base with strong retention signals
Recency Effect: P(Alive) drops sharply with time since last purchase (0.23 at 120–150 days vs. 0.81–1.0 at 0–30 days), demonstrating recency as the dominant churn predictor
Frequency Amplification: Higher purchase frequency sustains P(Alive) even at extended recency (frequency 4–7 customers maintain P(Alive) = 1.0 within 0–30 days), showing loyal repeat buyers are resilient to churn risk
Uncertainty Zone: Customers with P(Alive) between 0.3–0.7 represent re-engagement opportunities; the data shows minimal representation in this range, suggesting a bimodal distribution of active vs. churned segments

Interpretation

The heatmap reveals that rec

Distribution of predicted customer lifetime values

Interpretation

Purpose

This section quantifies how predicted customer lifetime value distributes across the customer base, revealing concentration patterns in revenue potential. Understanding CLV distribution is essential for resource allocation, as it identifies whether value is concentrated among a few high-value customers or spread evenly across the base.

Key Findings

Mean vs. Median CLV: Both approximately $100,098, indicating a remarkably symmetric distribution rather than the typical right-skew observed in e-commerce. This suggests relatively homogeneous customer value.
Customer Concentration: The second bin (CLV $99,926–$100,134) contains the largest customer segment with 5 customers, representing 41.7% of the base.
Top-Tier Representation: Only 1 customer occupies the highest CLV bracket ($100,551–$100,759), yet 91.7% of customers fall within the middle three bins, showing limited extreme value variation.
Distribution Range: CLV spans only $1,041 across all customers (99,718–100,759), a narrow band relative to absolute values.

Interpretation

The near-identical mean and median CLV contradicts typical e-commerce patterns where high-value customers drive disproportionate revenue. This dataset exhibits unusual homogeneity in predicted lifetime value, suggesting either a mature, stable customer base with consistent purchasing behavior

Highest-value customers ranked by predicted CLV

Interpretation

Purpose

This section identifies the 12 highest-value customers by predicted lifetime value (CLV) over the next 365 days, ranked to guide resource allocation for retention and engagement strategies. The visualization combines CLV predictions with P(Alive) probability—a measure of churn risk—to highlight which customers warrant priority investment in loyalty programs, early access campaigns, and dedicated support.

Key Findings

Predicted CLV Range: $99,718–$100,759 (mean: $100,099) — remarkably tight clustering indicates homogeneous customer value despite behavioral differences
P(Alive) Distribution: Mean 0.85, median 0.99, with sharp negative skew (−1.4) — most customers show high retention probability, but 2 customers (Customer-2, Customer-10) fall below 0.4 threshold
Frequency-Recency Mismatch: High-frequency customers (7 transactions) have zero recency; low-frequency customers (1 transaction) show 145-day gaps, indicating distinct engagement patterns
Spend Variability: Average spend ranges $225–$442 (sd: $65), suggesting spending behavior is less predictable than transaction frequency

Interpretation

The BG/NBD model predicts similar CLV across all top customers despite divergent behavioral profiles

Customer segmentation by CLV tier and activity status

segment_name	customer_count	avg_clv	total_clv	avg_palive	pct_customers
At Risk	1	1.003e+05	1.003e+05	0.372	8.3
Champions	2	1.005e+05	2.01e+05	0.999	16.7
Lost	1	9.985e+04	9.985e+04	0.089	8.3
Loyal	6	1.001e+05	6.005e+05	0.966	50
Potential	2	9.973e+04	1.995e+05	0.996	16.7

Interpretation

Purpose

This section segments the customer base into five behavioral groups based on predicted Customer Lifetime Value (CLV) and probability of being alive (P(Alive)), enabling targeted retention and growth strategies. Understanding segment composition reveals where value is concentrated and which customers face churn risk, directly supporting the BG/NBD and Gamma-Gamma modeling objectives.

Key Findings

Loyal Segment Dominance: 50% of customers (6 of 12) classified as Loyal, generating $600,518.50 in total CLV with 0.97 average P(Alive)—the largest value pool with strong retention signals
Champions Concentration: 2 customers (16.7%) represent top-tier value at $100,506.18 average CLV with perfect P(Alive) = 1.0
At Risk & Lost: Combined 2 customers (16.7%) show churn signals (P(Alive) < 0.5), with one At Risk customer still holding $100,338.08 CLV
Potential Growth: 2 customers (16.7%) with lower CLV ($99,732.67 avg) but high P(Alive) = 1.0 represent expansion opportunities

Interpretation

The segmentation reveals a healthy portfolio skewed toward active

Model calibration vs holdout validation

frequency_group	actual_transactions	predicted_transactions	n_customers	error_pct
0 (one-time)	1	328	2	3.27e+04
1 repeat	0.667	328	3	4.91e+04
2 repeats	3	328.2	2	1.084e+04
3-4 repeats	4.333	327.1	3	7448
5+ repeats	7	329.4	2	4605

Interpretation

Purpose

This section validates the BG/NBD model's predictive accuracy by comparing forecasted transactions against actual holdout-period observations across customer frequency groups. Strong alignment between predicted and actual values confirms the model reliably captures customer purchase behavior, which is essential for accurate CLV estimation and segmentation.

Key Findings

Prediction Consistency: Predicted transactions cluster tightly around 327–329 across all frequency groups, showing stable model output
Error Pattern by Frequency: One-time and low-frequency customers exhibit extreme error rates (32,702% and 49,096%), while high-frequency customers (5+ repeats) show minimal error (4,605%)
Sample Size Constraint: Only 2–3 customers per frequency group limits statistical reliability of error estimates

Interpretation

The model demonstrates strong predictive power for repeat customers but struggles with sparse-data segments. High-frequency customers provide sufficient transaction history for accurate parameter estimation, whereas one-time buyers offer minimal signal, making individual-level predictions inherently unreliable. The extreme percentage errors for low-frequency groups reflect small absolute differences magnified by low baseline actuals, not fundamental model failure.

Context

Validation results align with CLV distribution and segment summary findings, where Loyal and Champions segments (higher frequency) show stable CLV estimates. The small sample sizes per group suggest

BG/NBD + Gamma-Gamma model parameter estimates

model	parameter_name	estimate	interpretation
BG/NBD	r (purchase rate shape)	1.009	Shape of purchase rate Gamma distribution
BG/NBD	alpha (purchase rate scale)	2.759	Scale of purchase rate Gamma distribution
BG/NBD	a (dropout shape)	0.0025	First shape parameter of dropout Beta distribution
BG/NBD	b (dropout shape)	0.0121	Second shape parameter of dropout Beta distribution
Gamma-Gamma	p (spend shape)	4.733	Individual spend variability shape
Gamma-Gamma	q (pop spend shape)	1203	Population spend heterogeneity shape
Gamma-Gamma	v (spend scale)		Scale of spending distribution

Interpretation

Purpose

This section presents the estimated parameters of a BG/NBD + Gamma-Gamma probabilistic model, which quantifies customer purchase behavior and spending patterns across the entire customer base. These parameters form the foundation for predicting customer lifetime value (CLV), probability of being alive (P(alive)), and expected transaction counts—all critical metrics visible in the heatmaps and customer segments throughout the analysis.

Key Findings

BG/NBD Purchase Rate (r=1.01, alpha=2.76): The r/alpha ratio of ~0.37 indicates moderate purchase frequency heterogeneity; customers vary substantially in their baseline purchase propensity.
BG/NBD Dropout Parameters (a≈0, b=0.01): Near-zero values suggest weak early-stage churn signals in the model, with dropout risk concentrated among inactive customers rather than new ones.
Gamma-Gamma Spend Shape (p=4.73, q=1202.75): The extremely high q parameter indicates strong population-level spend heterogeneity, explaining the wide CLV distribution (99.7K–100.8K range) despite similar transaction counts.

Interpretation

The model captures two distinct customer dimensions: purchase frequency (governed by BG/NBD) and monetary value (governed by Gamma-Gamma).

Analysis Overview

Configuration

Module Parameters

Interpretation

Purpose

Key Findings

Data Pipeline

Data Quality

Data Quality

Interpretation

Purpose

Key Findings

Interpretation

Context

Executive Summary

Key Metrics

Key Findings

Summary

Interpretation

Purpose

Key Findings

Interpretation

CLV Key Metrics

Interpretation

Purpose

Key Findings

Interpretation

Expected Transactions Matrix

Interpretation

Purpose

Key Findings

Interpretation

P(Alive) Probability Matrix

Interpretation

Purpose

Key Findings

Interpretation

CLV Distribution

Interpretation

Purpose

Key Findings

Interpretation

Top Customers by CLV

Interpretation

Purpose

Key Findings

Interpretation

Customer Segments

Interpretation

Purpose

Key Findings

Interpretation

Model Validation

Interpretation

Purpose

Key Findings

Interpretation

Context

Model Parameters

Interpretation

Purpose

Key Findings

Interpretation