Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| features | TikTok, Facebook, Google Ads, Sales | features |
| eps | eps | |
| min_pts | 5 | min_pts |
| scale_features | TRUE | scale_features |
This DBSCAN clustering analysis identifies natural groupings within marketing spend data across four channels (TikTok, Facebook, Google Ads, and Sales outcomes). The objective is to discover distinct marketing spend patterns that can inform segmentation and strategy decisions for Demo Company.
The analysis successfully partitioned 200 marketing records into seven meaningful segments based on spending density patterns. The moderate silhouette score indicates clusters are reasonably well-defined but with some overlap, typical of real-world marketing data
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 200 |
| Final Rows | 200 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data cleaning and preparation phase for the DBSCAN clustering analysis of marketing spend. Perfect data retention indicates no rows were excluded during preprocessing, ensuring the full dataset of 200 observations remains available for discovering natural clusters in marketing channel spending patterns.
The perfect retention rate reflects a clean input dataset with minimal data quality issues. For DBSCAN, this is advantageous because all 200 marketing spend observations contribute to density-based cluster formation. The absence of missing value removals (noted in the overall context) means the full population of marketing campaigns is represented, supporting robust cluster discovery across the four features: TikTok, Facebook, Google Ads, and Sales.
While 100% retention is positive, the overall analysis notes that missing values existed in the original data but were handled upstream. The
| Finding | Value |
|---|---|
| Clusters Found | 7 |
| Noise Points | 9 |
| Noise Rate | 4.5% |
| Silhouette Score | 0.632 |
| Eps Used | 0.7646 |
| MinPts | 5 |
| Features Used | 4 |
| Total Points Analyzed | 200 |
This executive summary evaluates whether DBSCAN successfully discovered natural clusters in your marketing spend data (200 records across TikTok, Facebook, Google Ads, and Sales). The analysis objective was to identify distinct customer segments or spending patterns without predefined labels, enabling targeted marketing strategy refinement.
The clustering successfully partitioned your marketing spend data into seven interpretable segments with minimal noise contamination. The moderate silhouette score (0.632) reflects natural density variations in spending patterns—some clusters are tighter than others, which is expected in real marketing data. The algorithm identified both major segments (Cluster 3: 31.5% of data) and niche groups (Cluster 7: 2.5%), suggesting heterogeneous customer behavior across channels.
2D PCA scatter plot showing cluster assignments. PC1 explains 48% and PC2 explains 27.4% of variance. Noise points are shown separately.
This section visualizes the natural groupings discovered in marketing spend data through DBSCAN clustering. The 2D PCA projection allows interpretation of seven distinct clusters and identifies nine anomalous observations that don't fit any cluster pattern. Together, these findings reveal the underlying structure of marketing spend behavior across the 200 observations.
The clustering reveals that marketing spend behavior is not uniformly distributed but concentrates around seven distinct profiles. The low noise rate (4.5%) indicates DBSCAN successfully identified genuine density-based structures rather than arbitrary partitions. The PCA projection captures three-quarters of variance, meaning the 2D visualization reliably represents the underlying four-dimensional marketing spend space. The concentration in Clusters 1 and
K-distance plot (sorted 5-th nearest neighbor distances). The 'elbow' or 'knee' of the curve indicates the recommended eps value. Points above the elbow are typically noise.
The k-distance plot visualizes the distance from each point to its 5th nearest neighbor, sorted in ascending order. This diagnostic tool identifies the natural elbow or knee in the curve, which indicates the optimal threshold (eps) for separating dense marketing spend clusters from isolated outliers. This parameter selection is critical for DBSCAN's ability to discover meaningful patterns in the data.
The k-distance curve shows that approximately 95% of points have neighbors within 0.76 units, while ~5% are outliers—consistent with the observed 4.5% noise rate. This alignment validates the parameter choice: the algorithm correctly identified a natural density threshold in the marketing spend data, allowing it to isolate
Distribution of observations across clusters. Shows how many data points belong to each cluster and what percentage are classified as noise.
This section reveals how the 200 marketing spend observations are distributed across the 7 discovered clusters and identifies outliers. A balanced distribution indicates well-separated, meaningful groups, while excessive noise suggests the algorithm's parameters (eps, minPts) may need adjustment. Understanding cluster sizes is essential for assessing whether the segmentation is actionable for marketing strategy.
The clustering successfully partitioned marketing spend data into seven distinct groups with minimal noise contamination. Cluster 3's prominence suggests a dominant marketing spend profile exists in the dataset, while smaller clusters (Clusters 5, 7) represent specialized or niche spending behaviors. The low noise rate validates that the density-based approach appropriately captured natural groupings without
Mean standardized feature values per cluster. Positive values indicate above-average feature values; negative values indicate below-average.
This section reveals the marketing spend characteristics that define each cluster by showing standardized feature values. Since the objective is to discover natural clusters in marketing spend data, cluster profiles identify which channels (TikTok, Facebook, Google Ads) and sales outcomes distinguish one segment from another—the core drivers of cluster separation.
The seven clusters represent distinct marketing spend combinations. Clusters with near-zero values (e.g., Cluster 1 on TikTok: -0.62) indicate channels not prioritized in that segment, while positive/negative extremes highlight signature strategies. The moderate silhouette score suggests clusters are meaningful but overlap exists—some marketing profiles share
Cluster profiles showing original-scale mean feature values per cluster.
| cluster_label | count | pct_of_total | TikTok | Google Ads | Sales | |
|---|---|---|---|---|---|---|
| Noise | 9 | 4.5 | 1.084e+04 | 2087 | 229.1 | 1.199e+04 |
| Cluster 1 | 44 | 22 | 0 | 4685 | 1990 | 1.145e+04 |
| Cluster 2 | 18 | 9 | 0 | 4952 | 0 | 9296 |
| Cluster 3 | 63 | 31.5 | 0 | 0 | 2000 | 8956 |
| Cluster 4 | 21 | 10.5 | 1.026e+04 | 4990 | 2035 | 1.511e+04 |
| Cluster 5 | 17 | 8.5 | 0 | 0 | 0 | 6666 |
| Cluster 6 | 23 | 11.5 | 9906 | 0 | 1990 | 1.302e+04 |
| Cluster 7 | 5 | 2.5 | 9689 | 4772 | 0 | 1.204e+04 |
This section reveals the average marketing spend and sales performance across seven distinct customer segments, plus nine anomalous cases. By showing original-scale means, it enables direct interpretation of cluster characteristics without statistical transformation, making it straightforward to identify which segments are high-activity, low-activity, or specialized in their channel mix.
The clustering reveals distinct marketing efficiency patterns. Cluster 4 demonstrates that balanced, high-investment strategies correlate with peak sales, while Cluster 3's
DBSCAN algorithm parameters and quality metrics.
| parameter | value |
|---|---|
| eps (neighborhood radius) | 0.7646 |
| minPts | 5 |
| Features used | TikTok, Facebook, Google Ads, Sales |
| Clusters found | 7 |
| Noise points | 9 |
| Noise rate (%) | 4.5% |
| Silhouette score | 0.632 |
| Total points | 200 |
This section documents the DBSCAN algorithm configuration and validates clustering quality for the marketing spend analysis. The parameters control how density-based clusters are identified, while quality metrics confirm whether the discovered clusters are meaningful and well-separated. Understanding these settings is essential for interpreting the 7 clusters and assessing confidence in the segmentation results.
The algorithm successfully identified seven natural groupings in marketing spend behavior with minimal noise contamination. The silhouette score of 0.632 suggests clusters are reasonably well-defined, though not perfectly separated—typical for real-world marketing data where spending patterns may overlap. The auto-selected eps value balances sensitivity to local density variations while maintaining stable cluster