Analytics · Statistical · Clustering · Dbscan

Overview

Analysis Overview

Analysis overview and configuration

Analysis TypeDbscan

CompanyDemo Company

ObjectiveDiscover natural clusters in marketing spend data

Analysis Date2026-03-15

Processing Idtest_1773618141

Total Observations200

Parameter	Value	_row
features	TikTok, Facebook, Google Ads, Sales	features
eps		eps
min_pts	5	min_pts
scale_features	TRUE	scale_features

Interpretation

Purpose

This DBSCAN clustering analysis identifies natural groupings within marketing spend data across four channels (TikTok, Facebook, Google Ads, and Sales outcomes). The objective is to discover distinct marketing spend patterns that can inform segmentation and strategy decisions for Demo Company.

Key Findings

Clusters Identified: 7 distinct clusters plus 9 noise points (4.5% noise rate) — indicating well-separated, density-based groups with minimal outliers
Silhouette Score: 0.632 — reflects reasonable cluster cohesion and separation, suggesting meaningful but not perfect cluster structure
Cluster Distribution: Highly imbalanced, with Cluster 3 dominating at 31.5% (63 observations) and Cluster 7 minimal at 2.5% (5 observations)
Feature Patterns: Clusters show distinct channel preferences—some focus exclusively on single channels (e.g., Cluster 5: Google Ads only; Cluster 3: Google Ads only), while others blend multiple channels (e.g., Cluster 4: all channels active)

Interpretation

The analysis successfully partitioned 200 marketing records into seven meaningful segments based on spending density patterns. The moderate silhouette score indicates clusters are reasonably well-defined but with some overlap, typical of real-world marketing data

Data preprocessing and column mapping

Initial Rows200

Final Rows200

Rows Removed0

Retention Rate100

Interpretation

Purpose

This section documents the data cleaning and preparation phase for the DBSCAN clustering analysis of marketing spend. Perfect data retention indicates no rows were excluded during preprocessing, ensuring the full dataset of 200 observations remains available for discovering natural clusters in marketing channel spending patterns.

Key Findings

Retention Rate: 100% (200/200 rows preserved) - No observations were removed during cleaning, providing a complete dataset for clustering analysis
Rows Removed: 0 - No data quality issues triggered exclusion criteria, suggesting the raw data was already in acceptable condition
Initial vs. Final: Identical row counts (200) confirm no filtering or transformation-based row elimination occurred
Train/Test Split: Not applicable - Unsupervised clustering does not require traditional train/test partitioning

Interpretation

The perfect retention rate reflects a clean input dataset with minimal data quality issues. For DBSCAN, this is advantageous because all 200 marketing spend observations contribute to density-based cluster formation. The absence of missing value removals (noted in the overall context) means the full population of marketing campaigns is represented, supporting robust cluster discovery across the four features: TikTok, Facebook, Google Ads, and Sales.

Context

While 100% retention is positive, the overall analysis notes that missing values existed in the original data but were handled upstream. The

Executive Summary

Executive summary of DBSCAN clustering results.

clusters_found

noise_points

noise_rate_pct

4.5

silhouette_score

0.632

eps_used

0.7646

total_points

200

Finding	Value
Clusters Found	7
Noise Points	9
Noise Rate	4.5%
Silhouette Score	0.632
Eps Used	0.7646
MinPts	5
Features Used	4
Total Points Analyzed	200

Bottom Line: DBSCAN discovered 7 natural clusters in your data, with 9 outlier points (4.5% noise rate).

Key Findings:
• 7 distinct density clusters identified automatically
• Cluster quality (silhouette): 0.632 — reasonable separation
• 4.5% of points are outliers/noise
• Eps=0.7646 selected via k-distance elbow method

Next Steps:
• Review cluster profiles to label each segment meaningfully
• Investigate noise points — they may represent anomalies, fraud, or special cases
• Results look healthy; proceed with cluster labeling and business action

Interpretation

Purpose

This executive summary evaluates whether DBSCAN successfully discovered natural clusters in your marketing spend data (200 records across TikTok, Facebook, Google Ads, and Sales). The analysis objective was to identify distinct customer segments or spending patterns without predefined labels, enabling targeted marketing strategy refinement.

Key Findings

Clusters Discovered: 7 distinct density-based clusters identified automatically from the data
Noise Rate: 4.5% (9 outlier points) — well below the 50% quality threshold, indicating clean segmentation
Silhouette Score: 0.632 — indicates reasonable cluster separation and cohesion; clusters are meaningfully distinct
Eps Parameter: 0.765 selected via k-distance elbow method, balancing neighborhood density sensitivity
Data Retention: 100% of records retained; no preprocessing losses

Interpretation

The clustering successfully partitioned your marketing spend data into seven interpretable segments with minimal noise contamination. The moderate silhouette score (0.632) reflects natural density variations in spending patterns—some clusters are tighter than others, which is expected in real marketing data. The algorithm identified both major segments (Cluster 3: 31.5% of data) and niche groups (Cluster 7: 2.5%), suggesting heterogeneous customer behavior across channels.

Visualization

Cluster Visualization

2D PCA scatter plot showing cluster assignments. PC1 explains 48% and PC2 explains 27.4% of variance. Noise points are shown separately.

Interpretation

Purpose

This section visualizes the natural groupings discovered in marketing spend data through DBSCAN clustering. The 2D PCA projection allows interpretation of seven distinct clusters and identifies nine anomalous observations that don't fit any cluster pattern. Together, these findings reveal the underlying structure of marketing spend behavior across the 200 observations.

Key Findings

Clusters Found: 7 distinct density-based groups identified in marketing spend patterns
Noise Points: 9 observations (4.5%) flagged as outliers—below the 50% quality threshold, indicating clean clustering
Variance Explained: 75.4% of total variance captured in PC1 (48%) and PC2 (27.4%), preserving sufficient information for interpretation
Dominant Cluster: Cluster 3 contains 31.5% of data (63 points), suggesting a major marketing spend profile; Cluster 1 (22%) is secondary

Interpretation

The clustering reveals that marketing spend behavior is not uniformly distributed but concentrates around seven distinct profiles. The low noise rate (4.5%) indicates DBSCAN successfully identified genuine density-based structures rather than arbitrary partitions. The PCA projection captures three-quarters of variance, meaning the 2D visualization reliably represents the underlying four-dimensional marketing spend space. The concentration in Clusters 1 and

Visualization

K-Distance Plot

K-distance plot (sorted 5-th nearest neighbor distances). The 'elbow' or 'knee' of the curve indicates the recommended eps value. Points above the elbow are typically noise.

Interpretation

Purpose

The k-distance plot visualizes the distance from each point to its 5th nearest neighbor, sorted in ascending order. This diagnostic tool identifies the natural elbow or knee in the curve, which indicates the optimal threshold (eps) for separating dense marketing spend clusters from isolated outliers. This parameter selection is critical for DBSCAN's ability to discover meaningful patterns in the data.

Key Findings

eps_used (0.765): Auto-detected from the elbow, representing the neighborhood radius for density-based clustering
min_pts (5): Minimum points required to form a dense region; combined with eps, defines cluster membership
kdist range (0.09–1.89): Wide spread indicates variable point densities; most points cluster tightly (median 0.38) with a right-skewed tail of sparse points
Elbow detection: The suggested eps of 0.76 aligns with the curve's inflection point, balancing cluster discovery against noise tolerance

Interpretation

The k-distance curve shows that approximately 95% of points have neighbors within 0.76 units, while ~5% are outliers—consistent with the observed 4.5% noise rate. This alignment validates the parameter choice: the algorithm correctly identified a natural density threshold in the marketing spend data, allowing it to isolate

Visualization

Cluster Size Distribution

Distribution of observations across clusters. Shows how many data points belong to each cluster and what percentage are classified as noise.

Interpretation

Purpose

This section reveals how the 200 marketing spend observations are distributed across the 7 discovered clusters and identifies outliers. A balanced distribution indicates well-separated, meaningful groups, while excessive noise suggests the algorithm's parameters (eps, minPts) may need adjustment. Understanding cluster sizes is essential for assessing whether the segmentation is actionable for marketing strategy.

Key Findings

Cluster 3 dominance: 63 observations (31.5%) form the largest cluster, indicating a substantial segment with similar marketing spend patterns
Cluster 7 scarcity: Only 5 observations (2.5%) comprise the smallest cluster, representing a niche segment
Noise rate (4.5%): Only 9 points classified as outliers—a healthy proportion indicating eps was well-calibrated and most data fit meaningful patterns
Distribution skewness: Mean cluster size is 25 with standard deviation of 19.25, showing moderate variability across clusters rather than extreme imbalance

Interpretation

The clustering successfully partitioned marketing spend data into seven distinct groups with minimal noise contamination. Cluster 3's prominence suggests a dominant marketing spend profile exists in the dataset, while smaller clusters (Clusters 5, 7) represent specialized or niche spending behaviors. The low noise rate validates that the density-based approach appropriately captured natural groupings without

Visualization

Cluster Profiles

Mean standardized feature values per cluster. Positive values indicate above-average feature values; negative values indicate below-average.

Interpretation

Purpose

This section reveals the marketing spend characteristics that define each cluster by showing standardized feature values. Since the objective is to discover natural clusters in marketing spend data, cluster profiles identify which channels (TikTok, Facebook, Google Ads) and sales outcomes distinguish one segment from another—the core drivers of cluster separation.

Key Findings

Silhouette Score: 0.632 — Indicates reasonable cluster cohesion and separation; clusters are well-defined but not exceptionally tight
Feature Range: -1.75 to +1.66 — Large standardized differences across clusters signal that marketing spend patterns vary substantially between segments
Noise Cluster Profile — Shows extreme TikTok spending (+1.66) with minimal Google Ads (-1.48), representing outlier marketing strategies
Cluster 7 Pattern — High TikTok (+1.42) and Facebook (+1.02) but zero Google Ads (-1.75), indicating a distinct channel preference profile

Interpretation

The seven clusters represent distinct marketing spend combinations. Clusters with near-zero values (e.g., Cluster 1 on TikTok: -0.62) indicate channels not prioritized in that segment, while positive/negative extremes highlight signature strategies. The moderate silhouette score suggests clusters are meaningful but overlap exists—some marketing profiles share

Data Table

Cluster Profiles Table

Cluster profiles showing original-scale mean feature values per cluster.

cluster_label	count	pct_of_total	TikTok	Facebook	Google Ads	Sales
Noise	9	4.5	1.084e+04	2087	229.1	1.199e+04
Cluster 1	44	22	0	4685	1990	1.145e+04
Cluster 2	18	9	0	4952	0	9296
Cluster 3	63	31.5	0	0	2000	8956
Cluster 4	21	10.5	1.026e+04	4990	2035	1.511e+04
Cluster 5	17	8.5	0	0	0	6666
Cluster 6	23	11.5	9906	0	1990	1.302e+04
Cluster 7	5	2.5	9689	4772	0	1.204e+04

Interpretation

Purpose

This section reveals the average marketing spend and sales performance across seven distinct customer segments, plus nine anomalous cases. By showing original-scale means, it enables direct interpretation of cluster characteristics without statistical transformation, making it straightforward to identify which segments are high-activity, low-activity, or specialized in their channel mix.

Key Findings

Cluster 3 (31.5% of data): Dominant baseline segment—zero Feature 1 and Feature 2 spend, moderate Feature 3 (~$2,000), modest sales (~$8,956)
Cluster 4 (10.5% of data): Highest-activity cluster—maximum spend across all channels ($10,258 Feature 1, $4,990 Feature 2, $2,035 Feature 3) with strongest sales ($15,113)
Cluster 5 (8.5% of data): Minimal-spend segment—zero spend on Features 1–3, lowest sales ($6,666)
Noise points (4.5%): Extreme Feature 1 spenders ($10,839) with mixed channel allocation, suggesting outlier behavior

Interpretation

The clustering reveals distinct marketing efficiency patterns. Cluster 4 demonstrates that balanced, high-investment strategies correlate with peak sales, while Cluster 3's

Data Table

Parameter Summary

DBSCAN algorithm parameters and quality metrics.

parameter	value
eps (neighborhood radius)	0.7646
minPts	5
Features used	TikTok, Facebook, Google Ads, Sales
Clusters found	7
Noise points	9
Noise rate (%)	4.5%
Silhouette score	0.632
Total points	200

Interpretation

Purpose

This section documents the DBSCAN algorithm configuration and validates clustering quality for the marketing spend analysis. The parameters control how density-based clusters are identified, while quality metrics confirm whether the discovered clusters are meaningful and well-separated. Understanding these settings is essential for interpreting the 7 clusters and assessing confidence in the segmentation results.

Key Findings

eps (0.765): Neighborhood radius automatically selected via k-distance elbow method; defines the maximum distance between points in the same cluster
minPts (5): Minimum points required to form a core point; ensures clusters contain sufficient density
Clusters Found (7): Seven distinct marketing spend segments identified across the 200 observations
Noise Points (9, 4.5%): Low noise rate indicates most observations fit well into clusters; outliers represent unusual spending patterns
Silhouette Score (0.632): Moderate-to-good cluster cohesion and separation; indicates reasonably distinct marketing segments

Interpretation

The algorithm successfully identified seven natural groupings in marketing spend behavior with minimal noise contamination. The silhouette score of 0.632 suggests clusters are reasonably well-defined, though not perfectly separated—typical for real-world marketing data where spending patterns may overlap. The auto-selected eps value balances sensitivity to local density variations while maintaining stable cluster