Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| analysis_date | 2024-01-01 | analysis_date |
| n_bins | 5 | n_bins |
| top_n_customers | 20 | top_n_customers |
This RFM (Recency, Frequency, Monetary) segmentation analysis evaluates 300 customers from a demo e-commerce store to identify purchase behavior patterns and segment them for targeted marketing investment. The analysis processes 4,045 transaction records with perfect data retention, enabling the business to prioritize marketing spend toward high-value customer groups.
The analysis reveals a highly skewed customer value distribution typical of e-commerce: one-quarter of customers drive nearly 70% of revenue. The 10-segment taxonomy (Champions, Loyal Customers, Lost, Potential Loyalists, Need Attention, At Risk, New Customers, Promising, About to Sleep, Hibernating) provides granular targeting capability. The balanced R
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 4,045 |
| Final Rows | 4,045 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data cleaning and preparation phase for the RFM segmentation analysis. Perfect retention (100%) indicates that all 4,045 transaction records passed validation checks and were successfully processed into the final 300-customer dataset used for segmentation.
The perfect retention rate supports confidence in the segmentation results, as no systematic data quality issues forced exclusion of customer records. However, the lack of train/test validation means model performance cannot be independently verified. The 4,045 raw transactions were aggregated into 300 unique customers with RFM metrics, enabling the quintile-based scoring that produced the 10 customer segments.
RFM analysis is inherently descriptive rather than predictive, so absence of train/test splits is standard practice. The assumption that all retained data accurately represents customer behavior depends
| finding | value |
|---|---|
| Total Customers Analyzed | 300 |
| Champion Customers | 77 (25.7%) |
| At-Risk Customers | 46 (15.3%) |
| Lost / Hibernating | 56 (18.7%) |
| Champion Revenue Share | 69.1% of $642,448 |
| Average Customer Spend | $2,141 |
| Average Purchase Frequency | 6.8 |
| Segments Identified | 10 |
This section synthesizes the RFM segmentation results to assess whether the analysis achieved its objective of segmenting customers by purchase behavior to prioritize marketing spend. The findings reveal a highly concentrated revenue distribution that directly informs budget allocation strategy.
The analysis successfully achieved its segmentation objective, revealing extreme customer value concentration. Champions demonstrate recent, frequent, and high-value purchasing patterns (avg. 14.79 purchases, $5
Recency x Frequency score heatmap colored by average monetary value — the canonical RFM visualization showing where high-value customers cluster
This heatmap reveals where high-value customers concentrate across recency and frequency dimensions. It identifies your Champions (top-right: recent + frequent buyers) and dormant high-spenders (bottom-left: infrequent + distant), enabling targeted retention and reactivation strategies aligned with your segmentation objective.
The heatmap confirms that recency and frequency act synergistically: customers scoring high on both dimensions generate 28× more revenue ($8,095 vs. $282) than low scorers. The absence of high-frequency, low-recency customers suggests your business successfully retains engaged buyers. However, the large R
Horizontal bar chart showing customer count per named segment, ordered by average RFM score
This section reveals how the 300-customer base is distributed across 10 distinct behavioral segments, with emphasis on identifying high-value and at-risk populations. Understanding segment distribution is critical for prioritizing marketing spend and resource allocation—the core objective of this RFM analysis.
The distribution reveals a classic Pareto pattern: one-quarter of customers generate nearly 70% of revenue. The 15.3% at-risk segment is particularly significant because these customers historically demonstrated strong purchase behavior but have become dormant—they represent recoverable revenue. Conversely, the 18.7% lost/hibernating segment reflects customers who have already churned and require different engagement strategies than those still showing
Scatter plot of individual customers by Recency Score vs Frequency Score, with bubble size proportional to total spend
This scatter plot visualizes the distribution of 300 individual customers across the RFM space, revealing where each customer sits relative to recency and purchase frequency. It enables identification of customer concentration patterns and highlights which segments occupy high-value positions (top-right quadrant) versus at-risk zones (bottom-left), directly supporting the objective to prioritize marketing spend by customer behavior.
The bubble chart reveals a classic Pareto distribution: a concentrated elite of high-frequency, recent purchasers (Champions) generates 69.1% of revenue despite representing only 25.7% of customers. The median monetary value ($998
Treemap showing revenue contribution of each customer segment - larger rectangles represent segments generating more revenue
This treemap visualizes revenue concentration across customer segments, answering which segments drive the most business value. It reveals the critical insight that customer value is highly skewed—a small proportion of customers generates the majority of revenue. Understanding this distribution is essential for prioritizing marketing spend and retention efforts toward maximum financial impact.
The data demonstrates classic Pareto distribution in customer value. Champions are 2.7× more valuable per capita than the average customer, while segments like Hibernating and New Customers contribute minimally despite non-trivial customer counts. This concentration explains why retention strategies focused on Champions yield higher ROI than acquisition or reactivation efforts targeting lower-value segments. The At Risk segment represents the second-priority opportunity—protecting existing mid-tier revenue before it deteriorates.
This
Top customers ranked by combined RFM score, showing individual recency, frequency, and monetary metrics
| customer_id | recency_days | frequency | monetary_value | recency_score | frequency_score | monetary_score | rfm_score | segment |
|---|---|---|---|---|---|---|---|---|
| CUST_0187 | 41 | 30 | 1.547e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0134 | 12 | 26 | 1.457e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0016 | 71 | 28 | 1.399e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0224 | 61 | 27 | 1.37e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0181 | 13 | 29 | 1.305e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0024 | 22 | 27 | 1.239e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0128 | 27 | 27 | 1.227e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0023 | 48 | 30 | 1.204e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0049 | 14 | 26 | 1.189e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0001 | 29 | 27 | 1.18e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0002 | 10 | 23 | 1.166e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0013 | 54 | 24 | 1.166e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0207 | 57 | 28 | 1.153e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0148 | 47 | 28 | 1.116e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0017 | 27 | 23 | 1.105e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0044 | 63 | 23 | 1.104e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0105 | 26 | 22 | 1.099e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0269 | 60 | 26 | 1.041e+04 | 5 | 5 | 5 | 15 | Champions |
| CUST_0039 | 32 | 26 | 9635 | 5 | 5 | 5 | 15 | Champions |
| CUST_0106 | 47 | 22 | 9613 | 5 | 5 | 5 | 15 | Champions |
This section identifies the 20 individual customers with the highest combined RFM scores (perfect score of 15), representing the most valuable segment within your customer base. These customers drive disproportionate revenue and engagement, making them critical for understanding where your business value concentrates and informing targeted retention strategies.
The concentration of perfect RFM scores among these 20 customers demonstrates that your highest-value segment exhibits uniform excellence across all behavioral dimensions. This uniformity—rather than variation—suggests these customers represent a stable, predictable revenue base with minimal churn risk. Their recent activity and high transaction frequency indicate strong ongoing engagement, validating the RF
Distribution of Recency, Frequency, and Monetary scores across all customers
| score | count_r | count_f | count_m | pct_r | pct_f | pct_m |
|---|---|---|---|---|---|---|
| 1 | 60 | 60 | 60 | 20 | 20 | 20 |
| 2 | 60 | 60 | 60 | 20 | 20 | 20 |
| 3 | 60 | 60 | 60 | 20 | 20 | 20 |
| 4 | 60 | 60 | 60 | 20 | 20 | 20 |
| 5 | 60 | 60 | 60 | 20 | 20 | 20 |
This section reveals how Recency, Frequency, and Monetary scores distribute across your 300-customer base. A perfectly balanced quintile distribution would show 20% of customers in each score tier (1-5). This distribution is foundational to understanding whether your customer base skews toward recent, frequent, high-value purchasers or contains significant proportions of dormant or low-engagement segments.
The perfectly balanced distribution confirms the quintile-based RFM methodology is functioning as designed. Rather than revealing natural clustering in customer behavior, this equal split reflects the algorithmic approach of ranking customers into fixed percentile groups. The segment diversity (Champions through Hibernating) therefore emerges from combinations of R, F, and M scores