Analysis Overview and Data Quality
Cohort Retention Configuration
Analysis overview and configuration
test_1773172418
Analysis Overview
This analysis examines customer retention patterns across 13 cohorts spanning 2009-2010 for an online retail store. The objective is to understand how customer cohorts behave over their lifecycle, identify churn trends, and quantify the relationship between retention and revenue generation. This foundational view establishes the scale and severity of customer attrition challenges.
The data reveals a classic “leaky bucket” pattern: massive initial attrition followed by stabilization among survivors. The 77% first-period churn dominates the retention profile, yet customers who survive this critical phase generate meaningful revenue.
Analysis Overview
This analysis examines customer retention patterns across 13 cohorts spanning 2009-2010 for an online retail store. The objective is to understand how customer cohorts behave over their lifecycle, identify churn trends, and quantify the relationship between retention and revenue generation. This foundational view establishes the scale and severity of customer attrition challenges.
The data reveals a classic “leaky bucket” pattern: massive initial attrition followed by stabilization among survivors. The 77% first-period churn dominates the retention profile, yet customers who survive this critical phase generate meaningful revenue.
Data Quality & Completeness
Data preprocessing and column mapping
Data Preprocessing
This section documents the data preprocessing pipeline for the retention analysis covering 4,314 customers across 13 cohorts from December 2009 to December 2010. Perfect data retention indicates no rows were removed during cleaning, suggesting either exceptionally clean source data or minimal validation applied. Understanding preprocessing decisions is critical because they directly affect the reliability of retention metrics, survival curves, and revenue calculations used to assess customer lifetime value.
The perfect retention rate contrasts with observable data gaps in the cohort summary, where later cohorts show increasing missing values at t6 and t12 retention periods. This pattern reflects the natural limitation of the observation window (December 2009–December 2010) rather than data quality issues—later cohorts simply lack sufficient follow-
Data Preprocessing
This section documents the data preprocessing pipeline for the retention analysis covering 4,314 customers across 13 cohorts from December 2009 to December 2010. Perfect data retention indicates no rows were removed during cleaning, suggesting either exceptionally clean source data or minimal validation applied. Understanding preprocessing decisions is critical because they directly affect the reliability of retention metrics, survival curves, and revenue calculations used to assess customer lifetime value.
The perfect retention rate contrasts with observable data gaps in the cohort summary, where later cohorts show increasing missing values at t6 and t12 retention periods. This pattern reflects the natural limitation of the observation window (December 2009–December 2010) rather than data quality issues—later cohorts simply lack sufficient follow-
Key Findings and Recommendations
Key Findings & Recommendations
| Finding | Value |
|---|---|
| Total Customers Analyzed | 4,314 |
| Number of Cohorts | 13 |
| Overall Retention (Period 1) | 23.0% |
| First-Period Churn Rate | 77.0% |
| Median Customer Lifetime | 4.0 periods |
Bottom Line: Cohort retention analysis of 4,314 customers across 13 cohorts reveals critical customer lifecycle patterns.
Key Findings:
• First-period churn is 77.0% - CRITICAL: More than half of customers churn immediately. Prioritize onboarding improvements.
• Period-1 retention is 23.0%
• Median customer lifetime is 4.0 periods - use this for LTV forecasting and retention economics.
• The retention heatmap reveals cohort quality trends and lifecycle churn patterns
• Survival analysis provides statistical lifetime estimates with proper censoring handling
Recommendations:
1. Deploy retention interventions targeting the first period (highest churn window)
2. Investigate low-retention cohorts for acquisition quality or seasonality issues
3. A/B test onboarding improvements to reduce first-period churn
4. Calculate cohort ROI by combining these retention patterns with acquisition costs
5. Track retention trends monthly to detect early signs of retention degradation
Executive Summary
This analysis examines customer retention patterns across 4,314 customers organized into 13 cohorts spanning December 2009 through December 2010. The objective is to understand customer lifecycle behavior, identify retention risk periods, and quantify the business impact of churn on customer lifetime value and revenue sustainability.
The data reveals a “leaky bucket” acquisition model where the business acquires customers at scale but loses the majority immediately. The 77% first-period churn is catastrophic—it suggests either mis
Executive Summary
This analysis examines customer retention patterns across 4,314 customers organized into 13 cohorts spanning December 2009 through December 2010. The objective is to understand customer lifecycle behavior, identify retention risk periods, and quantify the business impact of churn on customer lifetime value and revenue sustainability.
The data reveals a “leaky bucket” acquisition model where the business acquires customers at scale but loses the majority immediately. The 77% first-period churn is catastrophic—it suggests either mis
Acquisition Volume and Cohort Sizes
Acquisition Volume by Cohort
Overview of cohort sizes and key retention metrics
Cohort Summary
This section establishes the foundational cohort structure for the retention analysis by segmenting 4,314 customers into 13 acquisition cohorts spanning December 2009 through December 2010. Understanding cohort composition is essential because retention patterns and revenue performance are meaningfully analyzed only when customers are grouped by acquisition timing, allowing fair comparison of behavior across different customer generations.
The cohort distribution reveals uneven acquisition patterns, with the earliest cohort (Dec 2009) capturing 22% of all customers. This imbalance means retention metrics for later cohorts will have wider confidence intervals and less statistical reliability. The declining cohort sizes toward year-end suggest either seasonal acquisition patterns or
Cohort Summary
This section establishes the foundational cohort structure for the retention analysis by segmenting 4,314 customers into 13 acquisition cohorts spanning December 2009 through December 2010. Understanding cohort composition is essential because retention patterns and revenue performance are meaningfully analyzed only when customers are grouped by acquisition timing, allowing fair comparison of behavior across different customer generations.
The cohort distribution reveals uneven acquisition patterns, with the earliest cohort (Dec 2009) capturing 22% of all customers. This imbalance means retention metrics for later cohorts will have wider confidence intervals and less statistical reliability. The declining cohort sizes toward year-end suggest either seasonal acquisition patterns or
Primary Retention Analysis
Retention Rate by Cohort × Lifetime Period
Retention rate heatmap showing percentage of each cohort still active at each lifecycle period
Cohort Retention Heatmap
This heatmap visualizes customer retention patterns across 13 cohorts over 12 lifecycle periods, revealing both universal churn triggers and cohort-specific quality differences. It serves as the diagnostic foundation for understanding whether retention problems stem from product/service issues (vertical patterns) or acquisition quality (horizontal patterns).
The dominant vertical pattern—the catastrophic drop from 100% to 23% between periods 0 and 1—suggests a universal structural problem rather than cohort-specific acquisition issues.
Cohort Retention Heatmap
This heatmap visualizes customer retention patterns across 13 cohorts over 12 lifecycle periods, revealing both universal churn triggers and cohort-specific quality differences. It serves as the diagnostic foundation for understanding whether retention problems stem from product/service issues (vertical patterns) or acquisition quality (horizontal patterns).
The dominant vertical pattern—the catastrophic drop from 100% to 23% between periods 0 and 1—suggests a universal structural problem rather than cohort-specific acquisition issues.
Cohort Retention Decay Patterns
Retention Decay by Cohort
Retention decay curves by cohort showing how each cohort's retention declines over time
Retention Curves
Retention curves visualize how each of the 13 cohorts lose customers over their lifetime, starting at 100% and decaying through 12 periods. This section identifies which cohorts retain customers better than others, revealing differences in acquisition quality, timing, or product conditions that may explain overall retention performance (23–28.5% across key milestones).
The divergence in retention
Retention Curves
Retention curves visualize how each of the 13 cohorts lose customers over their lifetime, starting at 100% and decaying through 12 periods. This section identifies which cohorts retain customers better than others, revealing differences in acquisition quality, timing, or product conditions that may explain overall retention performance (23–28.5% across key milestones).
The divergence in retention
Customer Lifetime Estimation
Kaplan-Meier Survival Curve
Kaplan-Meier survival curve showing probability of remaining active over time
Survival Analysis
This section applies Kaplan-Meier survival analysis to estimate customer lifetime while accounting for censoring (recent customers still active). Unlike cohort retention rates, survival analysis provides an unbiased estimate of true customer persistence patterns. The median customer lifetime of 4 periods serves as a critical anchor for forecasting customer lifetime value and retention economics across the 4,314-customer base.
The survival curve demonstrates that customer retention follows a classic “leaky bucket” pattern: aggressive early-stage churn (1,547 events at t=0) followed by sustained but slower attrition. The 4-period median lifetime aligns with the overall 26.7% retention at t=6
Survival Analysis
This section applies Kaplan-Meier survival analysis to estimate customer lifetime while accounting for censoring (recent customers still active). Unlike cohort retention rates, survival analysis provides an unbiased estimate of true customer persistence patterns. The median customer lifetime of 4 periods serves as a critical anchor for forecasting customer lifetime value and retention economics across the 4,314-customer base.
The survival curve demonstrates that customer retention follows a classic “leaky bucket” pattern: aggressive early-stage churn (1,547 events at t=0) followed by sustained but slower attrition. The 4-period median lifetime aligns with the overall 26.7% retention at t=6
Identifying Critical Churn Windows
Churn Rate by Lifecycle Period
Churn rate by lifecycle period showing when in the customer lifecycle the biggest drop-offs occur
Churn Analysis
This section identifies the critical lifecycle window where customer drop-off is most severe. Understanding churn timing reveals whether attrition is concentrated at onboarding (early friction) or distributed across the lifecycle, which fundamentally shapes retention strategy and resource allocation.
The data reveals a “leaky bucket” problem concentrated at entry. The 77% first-period churn dominates the overall 24.8% 12-month retention rate, meaning most customer loss occurs before meaningful engagement. Periods 2–11 show near-zero churn, indicating that customers who survive the critical first transition become highly stable. This pattern suggests onboarding friction or misaligned expectations drive initial attrition, not product quality issues.
Negative churn rates in periods 2–
Churn Analysis
This section identifies the critical lifecycle window where customer drop-off is most severe. Understanding churn timing reveals whether attrition is concentrated at onboarding (early friction) or distributed across the lifecycle, which fundamentally shapes retention strategy and resource allocation.
The data reveals a “leaky bucket” problem concentrated at entry. The 77% first-period churn dominates the overall 24.8% 12-month retention rate, meaning most customer loss occurs before meaningful engagement. Periods 2–11 show near-zero churn, indicating that customers who survive the critical first transition become highly stable. This pattern suggests onboarding friction or misaligned expectations drive initial attrition, not product quality issues.
Negative churn rates in periods 2–
Revenue-Weighted Retention and LTV
Cumulative Revenue Contribution
Revenue contribution by cohort over time showing lifetime value patterns
Revenue by Cohort
This section measures economic retention by tracking cumulative revenue contribution across cohorts over their lifetime. While headcount retention shows customer survival rates, revenue-weighted retention reveals whether high-value customers retain better or worse than the average, directly impacting lifetime value (LTV) and acquisition ROI.
The data reveals a highly concentrated revenue model where early cohorts drive disproportionate lifetime value. The 2009-12-01 cohort’s $5.2M revenue per customer (vs. $2,047 average) indicates acquisition timing or cohort quality dramatically affects economic
Revenue by Cohort
This section measures economic retention by tracking cumulative revenue contribution across cohorts over their lifetime. While headcount retention shows customer survival rates, revenue-weighted retention reveals whether high-value customers retain better or worse than the average, directly impacting lifetime value (LTV) and acquisition ROI.
The data reveals a highly concentrated revenue model where early cohorts drive disproportionate lifetime value. The 2009-12-01 cohort’s $5.2M revenue per customer (vs. $2,047 average) indicates acquisition timing or cohort quality dramatically affects economic
Platform-Wide Retention KPIs
Platform-Wide Retention KPIs
Platform-wide retention KPIs and summary statistics
| metric | value |
|---|---|
| Total Customers | 4314 |
| Number of Cohorts | 13 |
| Overall Retention (t=1) | 23.0% |
| Overall Retention (t=3) | 28.5% |
| Overall Retention (t=6) | 26.7% |
| Overall Retention (t=12) | 24.8% |
| First Period Churn Rate | 77.0% |
| Median Customer Lifetime | 4.0 periods |
| Average Revenue per Customer | $2047.29 |
Overall Metrics
This section aggregates retention performance across all 13 cohorts into platform-wide KPIs, providing a single-number summary of customer retention health. These metrics serve as executive-level indicators for tracking retention trends over time and detecting early signals of improvement or decline in customer loyalty.
The data reveals a classic “leaky bucket” pattern: aggressive early-stage churn (77%) filters out low-commitment customers, but those who survive the first period show modest stabilization. The gap between t=1 (23%) and t=3 (28.5%) retention suggests a small cohort of engaged
Overall Metrics
This section aggregates retention performance across all 13 cohorts into platform-wide KPIs, providing a single-number summary of customer retention health. These metrics serve as executive-level indicators for tracking retention trends over time and detecting early signals of improvement or decline in customer loyalty.
The data reveals a classic “leaky bucket” pattern: aggressive early-stage churn (77%) filters out low-commitment customers, but those who survive the first period show modest stabilization. The gap between t=1 (23%) and t=3 (28.5%) retention suggests a small cohort of engaged