Overview

Analysis Overview

PCA Configuration

Analysis overview and configuration

Configuration

Analysis TypePca
CompanyMarketing Analytics Co
ObjectiveIdentify key dimensions of variation in marketing spend and sales data
Analysis Date2026-03-15
Processing Idanalytics__statistical__dimensionality_reduction__pca_test_20260315_113424
Total Observations200

Module Parameters

ParameterValue_row
n_components4n_components
scale_dataTRUEscale_data
variance_threshold0.8variance_threshold
Pca analysis for Marketing Analytics Co

Interpretation

Purpose

This PCA analysis reduces 4 marketing and sales features into 2 principal components to identify the primary dimensions of variation in the dataset. By compressing the feature space while retaining 75.4% of total variance, the analysis enables simpler visualization and interpretation of marketing spend patterns without losing critical information.

Key Findings

  • Variance Explained (PC1 + PC2): 75.4% — Two components capture three-quarters of all variation, validating the dimensionality reduction strategy
  • PC1 Dominance: 48% variance — Driven primarily by feature_4 (−0.70 loading) and feature_1 (−0.56 loading), representing the strongest axis of differentiation
  • PC2 Contribution: 27.4% variance — Feature_3 (−0.80 loading) and feature_2 (0.60 loading) define the secondary dimension
  • Data Quality: 100% retention across 200 observations — No missing values compromised the analysis

Interpretation

The analysis successfully distills marketing spend and sales variation into two interpretable dimensions. PC1 appears to capture a scale or intensity factor (negative loadings on features 1 and 4), while PC2 represents a contrast between feature_3 and feature_2. The 75.4% variance threshold

Data Preparation

Data Preprocessing

Data Quality & Completeness

Data preprocessing and column mapping

Data Quality

Initial Rows200
Final Rows200
Rows Removed0
Retention Rate100

Data Quality

MetricValue
Initial Rows200
Final Rows200
Rows Removed0
Retention Rate100%
Processed 200 observations, retained 200 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data quality and retention outcomes during preprocessing for the PCA analysis. Perfect retention is critical for dimensionality reduction, as PCA requires complete feature matrices to compute meaningful variance structures across all 200 marketing observations.

Key Findings

  • Retention Rate: 100% (200/200 rows preserved) - All observations successfully passed quality checks with no exclusions
  • Rows Removed: 0 - No data loss occurred during cleaning or standardization procedures
  • Data Completeness: Full dataset available for PCA computation across all 4 marketing features
  • Train/Test Split: Not applicable - PCA is unsupervised and operates on the complete dataset without partitioning

Interpretation

The perfect retention rate indicates robust data quality in the marketing spend and sales dataset. No missing values or anomalies triggered removal, allowing the full 200-observation sample to contribute to principal component calculations. This maximizes statistical power for identifying variance dimensions and ensures the 75.4% cumulative variance explained by PC1 and PC2 is based on complete information rather than imputed or filtered data.

Context

While 100% retention is favorable, the analysis assumes features were already standardized (scale_data=TRUE) during PCA execution. The lack of train/test splitting reflects PCA's unsupervised nature; however, this means no independent validation

Executive Summary

Executive Summary

Key Findings & Recommendations

Key Metrics

features_analyzed
4
components_recommended
2
variance_captured
75.4%
pc1_variance
48%

Key Findings

FindingValue
Features Analyzed4
Recommended Components2
Variance Captured75.4%
PC1 Variance48%
Observations Used200

Summary

Bottom Line: PCA of 4 features identified 2 principal component(s) that together explain 75.4% of total variance (vs. 80% threshold).

Key Findings:
• PC1 alone captures 48% of variance — the dominant dimension in the data
• The first 2 components explain 75.4% combined
• 2 component(s) selected via Kaiser criterion and variance threshold
• 4 features reduced to 2 dimension(s) — 50% dimensionality reduction

Recommendation: Use the top 2 component(s) as input features for downstream models (clustering, classification, regression). Review the loadings heatmap to give each component a meaningful business name.

Interpretation

Purpose

This PCA analysis successfully reduced 4 marketing spend and sales features into 2 principal components, achieving the stated objective of identifying key dimensions of variation in the dataset. The analysis enables downstream modeling with simplified feature space while retaining meaningful variance structure.

Key Findings

  • Variance Captured: 75.4% across 2 components—just below the typical 80% threshold but acceptable for dimensionality reduction
  • PC1 Dominance: First component alone explains 48% of total variance, indicating one strong underlying dimension drives most variation
  • Dimensionality Reduction: 4 features compressed to 2 components (50% reduction) with minimal information loss
  • Feature Contributions: PC1 heavily weighted by feature_4 (−0.70) and feature_1 (−0.56); PC2 driven by feature_3 (−0.80) and feature_2 (0.60)

Interpretation

The analysis reveals that marketing spend and sales data cluster around two primary dimensions of variation. PC1 represents a scale or magnitude dimension (negative loadings suggest inverse relationships), while PC2 captures a distinct orthogonal pattern. Together, these components preserve three-quarters of the original information, making them suitable for clustering or predictive modeling without substantial degradation.

Context

PCA assumes linear relationships and requires standard

Figure 4

Scree Plot

Variance Explained by Component

Variance explained by each principal component

Interpretation

Purpose

The scree plot visualizes the variance contribution of each principal component, helping identify which dimensions capture the most meaningful variation in marketing spend and sales data. This section is critical for determining dimensionality reduction effectiveness—showing how much information is retained when moving from 4 original features to fewer principal components.

Key Findings

  • PC1 Variance: 48% - The first component alone captures nearly half of all variation, indicating a dominant underlying dimension in the marketing data
  • Top 2 Components: 75.4% cumulative variance - Two components together retain three-quarters of total information, suggesting substantial dimensionality reduction is possible
  • Eigenvalue Decline: Drops sharply from 1.92 (PC1) to 0.12 (PC4), with PC3 and PC4 contributing only 24.6% combined—indicating a clear elbow point after PC2

Interpretation

The scree plot reveals a strong concentration of variance in the first two components, supporting the PCA recommendation to retain only PC1 and PC2. This pattern suggests the four original marketing features are highly correlated and can be effectively compressed into two uncorrelated dimensions without substantial information loss. The steep drop-off after PC2 indicates that PC3 and PC4 capture only marginal, increasingly redundant variation.

Context

PCA assumes linear relationships among

Figure 5

PC Score Plot

Observations in PC Space

Observations projected onto the first two principal components

Interpretation

Purpose

This score plot projects 200 marketing observations onto the first two principal components, revealing the underlying structure of variation in marketing spend and sales data. By visualizing observations in reduced dimensional space, the plot identifies natural groupings, outliers, and patterns that would be invisible when examining the original four features individually.

Key Findings

  • Axes Variance: 75.4% - The first two components capture three-quarters of total variance, indicating strong dimensionality reduction without substantial information loss
  • PC1 Range: -3.57 to 2.69 (SD=1.39) - Captures 48% of variance; shows left-skewed distribution with one notable negative outlier (row 197 at -3.0)
  • PC2 Range: -1.4 to 2.64 (SD=1.05) - Captures 27.4% of variance; more symmetric distribution suggesting balanced secondary variation
  • Observation Spread: Points distributed across all quadrants with no obvious clustering, indicating continuous variation rather than discrete market segments

Interpretation

The relatively even scatter across PC space suggests marketing spend and sales metrics vary continuously across the 200 observations rather than forming distinct clusters. The left-skewed PC1 distribution indicates one observation exhibits an extreme pattern in the primary dimension of variation—likely representing either an outlier or a genuinely

Figure 6

Variable Loadings

Contribution of Each Feature to PCs

Contribution of each original variable to each principal component

Interpretation

Purpose

The loadings heatmap reveals how each of the 4 original marketing features contributes to the principal components. By identifying which variables load strongly together on the same component, this section enables you to assign business meaning to the mathematical dimensions—transforming abstract PCs into interpretable marketing dimensions that explain variation in spend and sales patterns.

Key Findings

  • PC1 Dominance: Feature_4 (-0.70) and Feature_1 (-0.56) drive PC1 most strongly, suggesting these variables move together and represent the primary axis of variation (48% of total variance)
  • PC2 Contrast: Feature_3 (-0.80) and Feature_2 (0.60) load oppositely on PC2, indicating they represent a contrasting dimension capturing 27.4% of variance
  • Feature_1 Versatility: Loads meaningfully across PC1 (-0.56), PC3 (0.66), and PC4 (0.51), showing it contributes to multiple dimensions
  • Loading Range: Values span -0.80 to 0.66, indicating moderate to strong contributions across all components

Interpretation

The negative loadings on PC1 (Feature_4, Feature_1) suggest these marketing metrics increase together in one direction. PC2's opposing loadings reveal a trade-off dynamic

Figure 7

Cumulative Variance

Information Retained by Component Count

Cumulative variance explained as more components are added

Interpretation

Purpose

This section quantifies the trade-off between dimensionality reduction and information retention. It demonstrates how many principal components are needed to capture meaningful variance in the marketing spend and sales data, guiding decisions about which components to retain for downstream analysis or visualization.

Key Findings

  • PC1 Alone: Captures 48% of variance—insufficient for comprehensive representation of data structure
  • PC1 + PC2 Combined: Captures 75.4% of total variance—falls slightly short of the 80% threshold but represents the recommended balance point
  • Diminishing Returns: Adding PC3 reaches 97.1% cumulative variance, but the marginal gain (21.6%) comes at the cost of losing dimensionality reduction benefits
  • Threshold Gap: The 4.6 percentage point shortfall from the 80% target reflects a practical trade-off between parsimony and completeness

Interpretation

The analysis reveals that two components effectively summarize three-quarters of the variation in the original four features. This suggests the underlying marketing and sales metrics share substantial covariance—likely reflecting common business drivers. While the 75.4% figure falls modestly below the 80% threshold, retaining only two dimensions reduces the feature space by 50% while preserving most meaningful variation, making it suitable for visualization and interpretation of marketing dynamics.

Context

P

Table 8

Component Summary

Eigenvalues and Variance by Component

Summary statistics for each principal component

ComponentEigenvalueVariance_PctCumulative_PctRecommended
PC11.9248%48%✓ Retain
PC21.09727.4%75.4%✓ Retain
PC30.86521.6%97.1%
PC40.1172.9%100%

Interpretation

Purpose

This section identifies which principal components merit retention based on statistical criteria. It shows how much variance each component captures and whether it meets the Kaiser criterion (eigenvalue > 1 for scaled data). This directly supports the marketing analytics objective by determining how many dimensions are needed to represent the key variation in marketing spend and sales data.

Key Findings

  • Recommended Components: 2 components retain 75.4% of total variance, meeting both Kaiser and variance thresholds
  • PC1 Eigenvalue: 1.92 (largest) — captures 48% of variance independently, indicating a dominant dimension of variation
  • PC2 Eigenvalue: 1.1 — contributes an additional 27.4%, bringing cumulative variance to 75.4%
  • Variance Drop-off: PC3 (eigenvalue 0.86) and PC4 (eigenvalue 0.12) fall below the Kaiser threshold, indicating diminishing information value

Interpretation

The two-component solution efficiently summarizes the marketing dataset's structure. PC1 represents nearly half the total variation, while PC2 adds substantial explanatory power. Together, they capture three-quarters of the data's variance while eliminating noise from weaker components. This 2D representation enables simplified visualization and analysis of marketing spend-sales relationships without substantial information loss.

Context

The Kaiser

Table 9

Top Variable Loadings

Strongest Feature Contributions per Component

Top variable loadings per component (top 3 by absolute value)

ComponentVariableLoadingAbs_Loading
PC1Sales-0.69840.6984
PC1TikTok-0.5560.556
PC1Facebook-0.37830.3783
PC2Google Ads-0.79860.7986
PC2Facebook0.60.6
PC2Sales-0.04820.0482
PC3TikTok0.65950.6595
PC3Facebook-0.60280.6028
PC3Google Ads-0.4470.447
PC4Sales-0.71290.7129
PC4TikTok0.50580.5058
PC4Facebook0.36530.3653

Interpretation

Purpose

This section identifies which original features most strongly define each principal component by ranking variables by absolute loading magnitude. High absolute loadings (near ±1) reveal the core drivers of variation in each dimension, enabling interpretation of what each PC represents in the marketing spend and sales context.

Key Findings

  • PC1 Dominance: feature_4 (−0.70) and feature_1 (−0.56) are the primary drivers, both with negative loadings, indicating they move inversely with PC1 scores
  • PC2 Structure: feature_3 (−0.80) shows the strongest single loading across all components, with feature_2 (0.60) providing contrasting positive direction
  • feature_2 Consistency: Appears in top 3 for all four components (loading range: −0.60 to 0.60), suggesting it contributes to multiple dimensions of variation
  • Loading Strength: Mean absolute loading of 0.53 indicates moderate-to-strong variable contributions; no feature is negligible

Interpretation

The analysis reveals that marketing spend and sales variation is primarily captured by feature_4 and feature_1 (PC1: 48% variance), with feature_3 providing orthogonal contrast (PC2: 27.4% variance). The consistent appearance of feature_2 across components suggests

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing