Analysis overview and configuration

Configuration

Analysis TypePca

CompanyMarketing Analytics Co

ObjectiveIdentify key dimensions of variation in marketing spend and sales data

Analysis Date2026-03-15

Processing Idanalytics__statistical__dimensionality_reduction__pca_test_20260315_113424

Total Observations200

Module Parameters

Parameter	Value	_row
n_components	4	n_components
scale_data	TRUE	scale_data
variance_threshold	0.8	variance_threshold

Pca analysis for Marketing Analytics Co

Interpretation

Purpose

This PCA analysis reduces 4 marketing and sales features into 2 principal components to identify the primary dimensions of variation in the dataset. By compressing the feature space while retaining 75.4% of total variance, the analysis enables simpler visualization and interpretation of marketing spend patterns without losing critical information.

Key Findings

Variance Explained (PC1 + PC2): 75.4% — Two components capture three-quarters of all variation, validating the dimensionality reduction strategy
PC1 Dominance: 48% variance — Driven primarily by feature_4 (−0.70 loading) and feature_1 (−0.56 loading), representing the strongest axis of differentiation
PC2 Contribution: 27.4% variance — Feature_3 (−0.80 loading) and feature_2 (0.60 loading) define the secondary dimension
Data Quality: 100% retention across 200 observations — No missing values compromised the analysis

Interpretation

The analysis successfully distills marketing spend and sales variation into two interpretable dimensions. PC1 appears to capture a scale or intensity factor (negative loadings on features 1 and 4), while PC2 represents a contrast between feature_3 and feature_2. The 75.4% variance threshold

Data preprocessing and column mapping

Data Quality

Initial Rows200

Final Rows200

Rows Removed0

Retention Rate100

Data Quality

Metric	Value
Initial Rows	200
Final Rows	200
Rows Removed	0
Retention Rate	100%

Processed 200 observations, retained 200 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data quality and retention outcomes during preprocessing for the PCA analysis. Perfect retention is critical for dimensionality reduction, as PCA requires complete feature matrices to compute meaningful variance structures across all 200 marketing observations.

Key Findings

Retention Rate: 100% (200/200 rows preserved) - All observations successfully passed quality checks with no exclusions
Rows Removed: 0 - No data loss occurred during cleaning or standardization procedures
Data Completeness: Full dataset available for PCA computation across all 4 marketing features
Train/Test Split: Not applicable - PCA is unsupervised and operates on the complete dataset without partitioning

Interpretation

The perfect retention rate indicates robust data quality in the marketing spend and sales dataset. No missing values or anomalies triggered removal, allowing the full 200-observation sample to contribute to principal component calculations. This maximizes statistical power for identifying variance dimensions and ensures the 75.4% cumulative variance explained by PC1 and PC2 is based on complete information rather than imputed or filtered data.

Context

While 100% retention is favorable, the analysis assumes features were already standardized (scale_data=TRUE) during PCA execution. The lack of train/test splitting reflects PCA's unsupervised nature; however, this means no independent validation

Key Metrics

features_analyzed: 4
components_recommended: 2
variance_captured: 75.4%
pc1_variance: 48%

Key Findings

Finding	Value
Features Analyzed	4
Recommended Components	2
Variance Captured	75.4%
PC1 Variance	48%
Observations Used	200

Summary

Bottom Line: PCA of 4 features identified 2 principal component(s) that together explain 75.4% of total variance (vs. 80% threshold).

Key Findings:
• PC1 alone captures 48% of variance — the dominant dimension in the data
• The first 2 components explain 75.4% combined
• 2 component(s) selected via Kaiser criterion and variance threshold
• 4 features reduced to 2 dimension(s) — 50% dimensionality reduction

Recommendation: Use the top 2 component(s) as input features for downstream models (clustering, classification, regression). Review the loadings heatmap to give each component a meaningful business name.

Interpretation

Purpose

This PCA analysis successfully reduced 4 marketing spend and sales features into 2 principal components, achieving the stated objective of identifying key dimensions of variation in the dataset. The analysis enables downstream modeling with simplified feature space while retaining meaningful variance structure.

Key Findings

Variance Captured: 75.4% across 2 components—just below the typical 80% threshold but acceptable for dimensionality reduction
PC1 Dominance: First component alone explains 48% of total variance, indicating one strong underlying dimension drives most variation
Dimensionality Reduction: 4 features compressed to 2 components (50% reduction) with minimal information loss
Feature Contributions: PC1 heavily weighted by feature_4 (−0.70) and feature_1 (−0.56); PC2 driven by feature_3 (−0.80) and feature_2 (0.60)

Interpretation

The analysis reveals that marketing spend and sales data cluster around two primary dimensions of variation. PC1 represents a scale or magnitude dimension (negative loadings suggest inverse relationships), while PC2 captures a distinct orthogonal pattern. Together, these components preserve three-quarters of the original information, making them suitable for clustering or predictive modeling without substantial degradation.

Context

PCA assumes linear relationships and requires standard

Variance explained by each principal component

Interpretation

Purpose

The scree plot visualizes the variance contribution of each principal component, helping identify which dimensions capture the most meaningful variation in marketing spend and sales data. This section is critical for determining dimensionality reduction effectiveness—showing how much information is retained when moving from 4 original features to fewer principal components.

Key Findings

PC1 Variance: 48% - The first component alone captures nearly half of all variation, indicating a dominant underlying dimension in the marketing data
Top 2 Components: 75.4% cumulative variance - Two components together retain three-quarters of total information, suggesting substantial dimensionality reduction is possible
Eigenvalue Decline: Drops sharply from 1.92 (PC1) to 0.12 (PC4), with PC3 and PC4 contributing only 24.6% combined—indicating a clear elbow point after PC2

Interpretation

The scree plot reveals a strong concentration of variance in the first two components, supporting the PCA recommendation to retain only PC1 and PC2. This pattern suggests the four original marketing features are highly correlated and can be effectively compressed into two uncorrelated dimensions without substantial information loss. The steep drop-off after PC2 indicates that PC3 and PC4 capture only marginal, increasingly redundant variation.

Context

PCA assumes linear relationships among

Observations projected onto the first two principal components

Interpretation

Purpose

This score plot projects 200 marketing observations onto the first two principal components, revealing the underlying structure of variation in marketing spend and sales data. By visualizing observations in reduced dimensional space, the plot identifies natural groupings, outliers, and patterns that would be invisible when examining the original four features individually.

Key Findings

Axes Variance: 75.4% - The first two components capture three-quarters of total variance, indicating strong dimensionality reduction without substantial information loss
PC1 Range: -3.57 to 2.69 (SD=1.39) - Captures 48% of variance; shows left-skewed distribution with one notable negative outlier (row 197 at -3.0)
PC2 Range: -1.4 to 2.64 (SD=1.05) - Captures 27.4% of variance; more symmetric distribution suggesting balanced secondary variation
Observation Spread: Points distributed across all quadrants with no obvious clustering, indicating continuous variation rather than discrete market segments

Interpretation

The relatively even scatter across PC space suggests marketing spend and sales metrics vary continuously across the 200 observations rather than forming distinct clusters. The left-skewed PC1 distribution indicates one observation exhibits an extreme pattern in the primary dimension of variation—likely representing either an outlier or a genuinely

Contribution of each original variable to each principal component

Interpretation

Purpose

The loadings heatmap reveals how each of the 4 original marketing features contributes to the principal components. By identifying which variables load strongly together on the same component, this section enables you to assign business meaning to the mathematical dimensions—transforming abstract PCs into interpretable marketing dimensions that explain variation in spend and sales patterns.

Key Findings

PC1 Dominance: Feature_4 (-0.70) and Feature_1 (-0.56) drive PC1 most strongly, suggesting these variables move together and represent the primary axis of variation (48% of total variance)
PC2 Contrast: Feature_3 (-0.80) and Feature_2 (0.60) load oppositely on PC2, indicating they represent a contrasting dimension capturing 27.4% of variance
Feature_1 Versatility: Loads meaningfully across PC1 (-0.56), PC3 (0.66), and PC4 (0.51), showing it contributes to multiple dimensions
Loading Range: Values span -0.80 to 0.66, indicating moderate to strong contributions across all components

Interpretation

The negative loadings on PC1 (Feature_4, Feature_1) suggest these marketing metrics increase together in one direction. PC2's opposing loadings reveal a trade-off dynamic

Cumulative variance explained as more components are added

Interpretation

Purpose

This section quantifies the trade-off between dimensionality reduction and information retention. It demonstrates how many principal components are needed to capture meaningful variance in the marketing spend and sales data, guiding decisions about which components to retain for downstream analysis or visualization.

Key Findings

PC1 Alone: Captures 48% of variance—insufficient for comprehensive representation of data structure
PC1 + PC2 Combined: Captures 75.4% of total variance—falls slightly short of the 80% threshold but represents the recommended balance point
Diminishing Returns: Adding PC3 reaches 97.1% cumulative variance, but the marginal gain (21.6%) comes at the cost of losing dimensionality reduction benefits
Threshold Gap: The 4.6 percentage point shortfall from the 80% target reflects a practical trade-off between parsimony and completeness

Interpretation

The analysis reveals that two components effectively summarize three-quarters of the variation in the original four features. This suggests the underlying marketing and sales metrics share substantial covariance—likely reflecting common business drivers. While the 75.4% figure falls modestly below the 80% threshold, retaining only two dimensions reduces the feature space by 50% while preserving most meaningful variation, making it suitable for visualization and interpretation of marketing dynamics.

Context

P

Summary statistics for each principal component

Component	Eigenvalue	Variance_Pct	Cumulative_Pct	Recommended
PC1	1.92	48%	48%	✓ Retain
PC2	1.097	27.4%	75.4%	✓ Retain
PC3	0.865	21.6%	97.1%
PC4	0.117	2.9%	100%

Interpretation

Purpose

This section identifies which principal components merit retention based on statistical criteria. It shows how much variance each component captures and whether it meets the Kaiser criterion (eigenvalue > 1 for scaled data). This directly supports the marketing analytics objective by determining how many dimensions are needed to represent the key variation in marketing spend and sales data.

Key Findings

Recommended Components: 2 components retain 75.4% of total variance, meeting both Kaiser and variance thresholds
PC1 Eigenvalue: 1.92 (largest) — captures 48% of variance independently, indicating a dominant dimension of variation
PC2 Eigenvalue: 1.1 — contributes an additional 27.4%, bringing cumulative variance to 75.4%
Variance Drop-off: PC3 (eigenvalue 0.86) and PC4 (eigenvalue 0.12) fall below the Kaiser threshold, indicating diminishing information value

Interpretation

The two-component solution efficiently summarizes the marketing dataset's structure. PC1 represents nearly half the total variation, while PC2 adds substantial explanatory power. Together, they capture three-quarters of the data's variance while eliminating noise from weaker components. This 2D representation enables simplified visualization and analysis of marketing spend-sales relationships without substantial information loss.

Context

The Kaiser

Top variable loadings per component (top 3 by absolute value)

Component	Variable	Loading	Abs_Loading
PC1	Sales	-0.6984	0.6984
PC1	TikTok	-0.556	0.556
PC1	Facebook	-0.3783	0.3783
PC2	Google Ads	-0.7986	0.7986
PC2	Facebook	0.6	0.6
PC2	Sales	-0.0482	0.0482
PC3	TikTok	0.6595	0.6595
PC3	Facebook	-0.6028	0.6028
PC3	Google Ads	-0.447	0.447
PC4	Sales	-0.7129	0.7129
PC4	TikTok	0.5058	0.5058
PC4	Facebook	0.3653	0.3653

Interpretation

Purpose

This section identifies which original features most strongly define each principal component by ranking variables by absolute loading magnitude. High absolute loadings (near ±1) reveal the core drivers of variation in each dimension, enabling interpretation of what each PC represents in the marketing spend and sales context.

Key Findings

PC1 Dominance: feature_4 (−0.70) and feature_1 (−0.56) are the primary drivers, both with negative loadings, indicating they move inversely with PC1 scores
PC2 Structure: feature_3 (−0.80) shows the strongest single loading across all components, with feature_2 (0.60) providing contrasting positive direction
feature_2 Consistency: Appears in top 3 for all four components (loading range: −0.60 to 0.60), suggesting it contributes to multiple dimensions of variation
Loading Strength: Mean absolute loading of 0.53 indicates moderate-to-strong variable contributions; no feature is negligible

Interpretation

The analysis reveals that marketing spend and sales variation is primarily captured by feature_4 and feature_1 (PC1: 48% variance), with feature_3 providing orthogonal contrast (PC2: 27.4% variance). The consistent appearance of feature_2 across components suggests

Analysis Overview

Configuration

Module Parameters

Interpretation

Purpose

Key Findings

Interpretation

Data Preprocessing

Data Quality

Data Quality

Interpretation

Purpose

Key Findings

Interpretation

Context

Executive Summary

Key Metrics

Key Findings

Summary

Interpretation

Purpose

Key Findings

Interpretation

Context

Scree Plot

Interpretation

Purpose

Key Findings

Interpretation

Context

PC Score Plot

Interpretation

Purpose

Key Findings

Interpretation

Variable Loadings

Interpretation

Purpose

Key Findings

Interpretation

Cumulative Variance

Interpretation

Purpose

Key Findings

Interpretation

Context

Component Summary

Interpretation

Purpose

Key Findings

Interpretation

Context

Top Variable Loadings

Interpretation

Purpose

Key Findings

Interpretation