Analysis Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| features | TikTok, Facebook, Google Ads, Sales | features |
| eps | eps | |
| min_pts | 5 | min_pts |
| scale_features | TRUE | scale_features |
Purpose
This DBSCAN clustering analysis identifies natural groupings within marketing spend data across four channels (TikTok, Facebook, Google Ads, and Sales outcomes). The objective is to discover distinct marketing spend patterns that can inform segmentation and strategy decisions for Demo Company.
Key Findings
- Clusters Identified: 7 distinct clusters plus 9 noise points (4.5% noise rate) — indicating well-separated, density-based groups with minimal outliers
- Silhouette Score: 0.632 — reflects reasonable cluster cohesion and separation, suggesting meaningful but not perfect cluster structure
- Cluster Distribution: Highly imbalanced, with Cluster 3 dominating at 31.5% (63 observations) and Cluster 7 minimal at 2.5% (5 observations)
- Feature Patterns: Clusters show distinct channel preferences—some focus exclusively on single channels (e.g., Cluster 5: Google Ads only; Cluster 3: Google Ads only), while others blend multiple channels (e.g., Cluster 4: all channels active)
Interpretation
The analysis successfully partitioned 200 marketing records into seven meaningful segments based on spending density patterns. The moderate silhouette score indicates clusters are reasonably well-defined but with some overlap, typical of real-world marketing data
Data preprocessing and column mapping
Purpose
This section documents the data cleaning and preparation phase for the DBSCAN clustering analysis of marketing spend. Perfect data retention indicates no rows were excluded during preprocessing, ensuring the full dataset of 200 observations remains available for discovering natural clusters in marketing channel spending patterns.
Key Findings
- Retention Rate: 100% (200/200 rows preserved) - No observations were removed during cleaning, providing a complete dataset for clustering analysis
- Rows Removed: 0 - No data quality issues triggered exclusion criteria, suggesting the raw data was already in acceptable condition
- Initial vs. Final: Identical row counts (200) confirm no filtering or transformation-based row elimination occurred
- Train/Test Split: Not applicable - Unsupervised clustering does not require traditional train/test partitioning
Interpretation
The perfect retention rate reflects a clean input dataset with minimal data quality issues. For DBSCAN, this is advantageous because all 200 marketing spend observations contribute to density-based cluster formation. The absence of missing value removals (noted in the overall context) means the full population of marketing campaigns is represented, supporting robust cluster discovery across the four features: TikTok, Facebook, Google Ads, and Sales.
Context
While 100% retention is positive, the overall analysis notes that missing values existed in the original data but were handled upstream. The
Executive Summary
Executive summary of DBSCAN clustering results.
| Finding | Value |
|---|---|
| Clusters Found | 7 |
| Noise Points | 9 |
| Noise Rate | 4.5% |
| Silhouette Score | 0.632 |
| Eps Used | 0.7646 |
| MinPts | 5 |
| Features Used | 4 |
| Total Points Analyzed | 200 |
Key Findings:
• 7 distinct density clusters identified automatically
• Cluster quality (silhouette): 0.632 — reasonable separation
• 4.5% of points are outliers/noise
• Eps=0.7646 selected via k-distance elbow method
Next Steps:
• Review cluster profiles to label each segment meaningfully
• Investigate noise points — they may represent anomalies, fraud, or special cases
• Results look healthy; proceed with cluster labeling and business action
Purpose
This executive summary evaluates whether DBSCAN successfully discovered natural clusters in your marketing spend data (200 records across TikTok, Facebook, Google Ads, and Sales). The analysis objective was to identify distinct customer segments or spending patterns without predefined labels, enabling targeted marketing strategy refinement.
Key Findings
- Clusters Discovered: 7 distinct density-based clusters identified automatically from the data
- Noise Rate: 4.5% (9 outlier points) — well below the 50% quality threshold, indicating clean segmentation
- Silhouette Score: 0.632 — indicates reasonable cluster separation and cohesion; clusters are meaningfully distinct
- Eps Parameter: 0.765 selected via k-distance elbow method, balancing neighborhood density sensitivity
- Data Retention: 100% of records retained; no preprocessing losses
Interpretation
The clustering successfully partitioned your marketing spend data into seven interpretable segments with minimal noise contamination. The moderate silhouette score (0.632) reflects natural density variations in spending patterns—some clusters are tighter than others, which is expected in real marketing data. The algorithm identified both major segments (Cluster 3: 31.5% of data) and niche groups (Cluster 7: 2.5%), suggesting heterogeneous customer behavior across channels.
Cluster Visualization
2D PCA scatter plot showing cluster assignments. PC1 explains 48% and PC2 explains 27.4% of variance. Noise points are shown separately.
Purpose
This section visualizes the natural groupings discovered in marketing spend data through DBSCAN clustering. The 2D PCA projection allows interpretation of seven distinct clusters and identifies nine anomalous observations that don't fit any cluster pattern. Together, these findings reveal the underlying structure of marketing spend behavior across the 200 observations.
Key Findings
- Clusters Found: 7 distinct density-based groups identified in marketing spend patterns
- Noise Points: 9 observations (4.5%) flagged as outliers—below the 50% quality threshold, indicating clean clustering
- Variance Explained: 75.4% of total variance captured in PC1 (48%) and PC2 (27.4%), preserving sufficient information for interpretation
- Dominant Cluster: Cluster 3 contains 31.5% of data (63 points), suggesting a major marketing spend profile; Cluster 1 (22%) is secondary
Interpretation
The clustering reveals that marketing spend behavior is not uniformly distributed but concentrates around seven distinct profiles. The low noise rate (4.5%) indicates DBSCAN successfully identified genuine density-based structures rather than arbitrary partitions. The PCA projection captures three-quarters of variance, meaning the 2D visualization reliably represents the underlying four-dimensional marketing spend space. The concentration in Clusters 1 and
K-Distance Plot
K-distance plot (sorted 5-th nearest neighbor distances). The 'elbow' or 'knee' of the curve indicates the recommended eps value. Points above the elbow are typically noise.
Purpose
The k-distance plot visualizes the distance from each point to its 5th nearest neighbor, sorted in ascending order. This diagnostic tool identifies the natural elbow or knee in the curve, which indicates the optimal threshold (eps) for separating dense marketing spend clusters from isolated outliers. This parameter selection is critical for DBSCAN's ability to discover meaningful patterns in the data.
Key Findings
- eps_used (0.765): Auto-detected from the elbow, representing the neighborhood radius for density-based clustering
- min_pts (5): Minimum points required to form a dense region; combined with eps, defines cluster membership
- kdist range (0.09–1.89): Wide spread indicates variable point densities; most points cluster tightly (median 0.38) with a right-skewed tail of sparse points
- Elbow detection: The suggested eps of 0.76 aligns with the curve's inflection point, balancing cluster discovery against noise tolerance
Interpretation
The k-distance curve shows that approximately 95% of points have neighbors within 0.76 units, while ~5% are outliers—consistent with the observed 4.5% noise rate. This alignment validates the parameter choice: the algorithm correctly identified a natural density threshold in the marketing spend data, allowing it to isolate
Cluster Size Distribution
Distribution of observations across clusters. Shows how many data points belong to each cluster and what percentage are classified as noise.
Purpose
This section reveals how the 200 marketing spend observations are distributed across the 7 discovered clusters and identifies outliers. A balanced distribution indicates well-separated, meaningful groups, while excessive noise suggests the algorithm's parameters (eps, minPts) may need adjustment. Understanding cluster sizes is essential for assessing whether the segmentation is actionable for marketing strategy.
Key Findings
- Cluster 3 dominance: 63 observations (31.5%) form the largest cluster, indicating a substantial segment with similar marketing spend patterns
- Cluster 7 scarcity: Only 5 observations (2.5%) comprise the smallest cluster, representing a niche segment
- Noise rate (4.5%): Only 9 points classified as outliers—a healthy proportion indicating eps was well-calibrated and most data fit meaningful patterns
- Distribution skewness: Mean cluster size is 25 with standard deviation of 19.25, showing moderate variability across clusters rather than extreme imbalance
Interpretation
The clustering successfully partitioned marketing spend data into seven distinct groups with minimal noise contamination. Cluster 3's prominence suggests a dominant marketing spend profile exists in the dataset, while smaller clusters (Clusters 5, 7) represent specialized or niche spending behaviors. The low noise rate validates that the density-based approach appropriately captured natural groupings without
Cluster Profiles
Mean standardized feature values per cluster. Positive values indicate above-average feature values; negative values indicate below-average.
Purpose
This section reveals the marketing spend characteristics that define each cluster by showing standardized feature values. Since the objective is to discover natural clusters in marketing spend data, cluster profiles identify which channels (TikTok, Facebook, Google Ads) and sales outcomes distinguish one segment from another—the core drivers of cluster separation.
Key Findings
- Silhouette Score: 0.632 — Indicates reasonable cluster cohesion and separation; clusters are well-defined but not exceptionally tight
- Feature Range: -1.75 to +1.66 — Large standardized differences across clusters signal that marketing spend patterns vary substantially between segments
- Noise Cluster Profile — Shows extreme TikTok spending (+1.66) with minimal Google Ads (-1.48), representing outlier marketing strategies
- Cluster 7 Pattern — High TikTok (+1.42) and Facebook (+1.02) but zero Google Ads (-1.75), indicating a distinct channel preference profile
Interpretation
The seven clusters represent distinct marketing spend combinations. Clusters with near-zero values (e.g., Cluster 1 on TikTok: -0.62) indicate channels not prioritized in that segment, while positive/negative extremes highlight signature strategies. The moderate silhouette score suggests clusters are meaningful but overlap exists—some marketing profiles share
Cluster Profiles Table
Cluster profiles showing original-scale mean feature values per cluster.
| cluster_label | count | pct_of_total | TikTok | Google Ads | Sales | |
|---|---|---|---|---|---|---|
| Noise | 9 | 4.5 | 1.084e+04 | 2087 | 229.1 | 1.199e+04 |
| Cluster 1 | 44 | 22 | 0 | 4685 | 1990 | 1.145e+04 |
| Cluster 2 | 18 | 9 | 0 | 4952 | 0 | 9296 |
| Cluster 3 | 63 | 31.5 | 0 | 0 | 2000 | 8956 |
| Cluster 4 | 21 | 10.5 | 1.026e+04 | 4990 | 2035 | 1.511e+04 |
| Cluster 5 | 17 | 8.5 | 0 | 0 | 0 | 6666 |
| Cluster 6 | 23 | 11.5 | 9906 | 0 | 1990 | 1.302e+04 |
| Cluster 7 | 5 | 2.5 | 9689 | 4772 | 0 | 1.204e+04 |
Purpose
This section reveals the average marketing spend and sales performance across seven distinct customer segments, plus nine anomalous cases. By showing original-scale means, it enables direct interpretation of cluster characteristics without statistical transformation, making it straightforward to identify which segments are high-activity, low-activity, or specialized in their channel mix.
Key Findings
- Cluster 3 (31.5% of data): Dominant baseline segment—zero Feature 1 and Feature 2 spend, moderate Feature 3 (~$2,000), modest sales (~$8,956)
- Cluster 4 (10.5% of data): Highest-activity cluster—maximum spend across all channels ($10,258 Feature 1, $4,990 Feature 2, $2,035 Feature 3) with strongest sales ($15,113)
- Cluster 5 (8.5% of data): Minimal-spend segment—zero spend on Features 1–3, lowest sales ($6,666)
- Noise points (4.5%): Extreme Feature 1 spenders ($10,839) with mixed channel allocation, suggesting outlier behavior
Interpretation
The clustering reveals distinct marketing efficiency patterns. Cluster 4 demonstrates that balanced, high-investment strategies correlate with peak sales, while Cluster 3's
Parameter Summary
DBSCAN algorithm parameters and quality metrics.
| parameter | value |
|---|---|
| eps (neighborhood radius) | 0.7646 |
| minPts | 5 |
| Features used | TikTok, Facebook, Google Ads, Sales |
| Clusters found | 7 |
| Noise points | 9 |
| Noise rate (%) | 4.5% |
| Silhouette score | 0.632 |
| Total points | 200 |
Purpose
This section documents the DBSCAN algorithm configuration and validates clustering quality for the marketing spend analysis. The parameters control how density-based clusters are identified, while quality metrics confirm whether the discovered clusters are meaningful and well-separated. Understanding these settings is essential for interpreting the 7 clusters and assessing confidence in the segmentation results.
Key Findings
- eps (0.765): Neighborhood radius automatically selected via k-distance elbow method; defines the maximum distance between points in the same cluster
- minPts (5): Minimum points required to form a core point; ensures clusters contain sufficient density
- Clusters Found (7): Seven distinct marketing spend segments identified across the 200 observations
- Noise Points (9, 4.5%): Low noise rate indicates most observations fit well into clusters; outliers represent unusual spending patterns
- Silhouette Score (0.632): Moderate-to-good cluster cohesion and separation; indicates reasonably distinct marketing segments
Interpretation
The algorithm successfully identified seven natural groupings in marketing spend behavior with minimal noise contamination. The silhouette score of 0.632 suggests clusters are reasonably well-defined, though not perfectly separated—typical for real-world marketing data where spending patterns may overlap. The auto-selected eps value balances sensitivity to local density variations while maintaining stable cluster