Engagement · Gsc · Experiments · Portfolio Analysis
Overview

Portfolio Overview

Analysis overview and configuration

Analysis TypePortfolio Analysis
CompanyMCP Analytics
ObjectiveAcross all active experiments, which hypotheses are winning and what patterns predict experiment success?
Analysis Date2026-03-03
Processing Idtest_1772576412
Total Observations19
ParameterValue_row
batch_assignmentsautobatch_assignments
content_type_regex/articles/|/tutorials/|/blogs/|/whitepapers/content_type_regex
min_impressions10min_impressions
win_threshold0.0win_threshold
Interpretation

Purpose

This analysis evaluates a portfolio of 12 active experiments to identify winning hypotheses and success patterns. The objective is to determine which treatments outperform controls and what characteristics—impression volume, ad position, content type—correlate with experiment success. Understanding these patterns enables more efficient hypothesis testing and resource allocation.

Key Findings

  • Win Rate: 33.33% (1 winning experiment out of 3 testable) with 75% of experiments marked insufficient for statistical conclusion
  • Winning Experiment (exp-016): 376.65% adjusted lift with projected 8.89 monthly click uplift from ~1,277 impressions
  • Position Pattern: Ads in positions 11-20 show 16.67% win rate vs. 0% for positions 4-10, suggesting placement matters
  • Impression Bucket Pattern: Mid-volume experiments (500-2,000 impressions) achieve 12.5% win rate; low and high volumes show 0%
  • Data Maturity Issue: 9 of 12 experiments need ~14 additional days to reach 80% statistical power

Interpretation

The portfolio shows early-stage results with limited statistical confidence. Only one experiment demonstrates clear success, while most remain underpowered. The position-based pattern (11-20 outperforming

Data preprocessing and column mapping

Initial Rows19
Final Rows19
Rows Removed0
Retention Rate100
Interpretation

Purpose

This section documents the data preprocessing pipeline for the experiment analysis, showing that all 19 observations (12 experiment verdicts, 1 batch summary, 6 pattern analyses, and supporting datasets) were retained without removal. Perfect retention indicates either minimal data quality issues or that no aggressive filtering was applied, which is critical for maintaining statistical validity in A/B testing analysis where sample size directly impacts power and significance detection.

Key Findings

  • Retention Rate: 100% (19/19 rows) - No observations were excluded during preprocessing, preserving the full experimental dataset
  • Rows Removed: 0 - No filtering, deduplication, or outlier removal occurred
  • Train/Test Split: Not applicable - This is descriptive analysis of completed experiments, not predictive modeling
  • Data Completeness: All 12 experiments retained despite 75% missing p-values, suggesting missing values were not treated as grounds for exclusion

Interpretation

The 100% retention rate reflects a conservative preprocessing approach appropriate for experiment analysis, where each trial represents a distinct business decision point. However, the absence of any data cleaning raises questions about how missing p-values (75% of cases marked "insufficient") and extreme values (raw lift ranging -100% to +732%) were handled. The lack of train/test splitting is expected since this is retrospective analysis of completed experiments rather than

Executive Summary

Executive Summary

Executive summary with key findings and recommendations

total_experiments
12
overall_win_rate
33.3333
avg_lift_winners
376.6485
projected_monthly_clicks
8.8906
program_verdict
Above benchmark
FindingValue
Total Experiments12 (3 testable)
Program Win Rate33.3%
Industry Benchmark15% (SearchPilot)
Winners (Promote)1 experiments
Projected Monthly Click Uplift9 clicks
Estimated Monthly Value$44 (at $5 CPC)
SEO Experiment Portfolio Analysis - Executive Summary

Program Health:
• Analyzed 12 experiments (7 with treatment data, 5 control-only)
• Overall win rate: 33.3%Above 15% industry benchmark ✓
• 1 winning, 0 losing, 2 neutral, 9 insufficient data

Effect Sizes:
• Winners average +376.6% position-adjusted CTR lift
• Median effect size: -50.0%
• Largest positive lift: +376.6%

ROI Projection:
9 additional monthly clicks if all winners promoted
• Estimated value: $44/month at $5 CPC

Recommendations:
Promote: 1 winning experiments immediately
Monitor: 2 neutral experiments (may need more time)
End Early: 0 losing experiments
Data Collection: 9 experiments need 14 more days on average
Interpretation

EXECUTIVE SUMMARY

Purpose

This analysis evaluates a portfolio of 12 SEO experiments to determine whether the testing program is delivering measurable business value. The assessment synthesizes win rates, effect sizes, and ROI projections to inform deployment decisions and resource allocation for the broader optimization initiative.

Key Findings

  • Overall Win Rate: 33.3% — Significantly exceeds the 15% industry benchmark, indicating the experiment portfolio is performing above expected baseline
  • Winning Effect Size: +376.6% adjusted CTR lift in the single confirmed winner, demonstrating substantial impact when treatments succeed
  • Statistical Maturity: 75% of experiments (9 of 12) lack sufficient data; only 3 experiments have testable verdicts, limiting confidence in portfolio-level conclusions
  • Projected Monthly Value: 8.9 additional clicks from promotion of winning experiments; modest absolute impact but positive directional signal
  • Data Collection Timeline: Insufficient experiments require approximately 14 additional days to reach 80% statistical power

Interpretation

The program demonstrates promise with a win rate well above industry norms and one experiment showing exceptional lift. However, the portfolio remains largely underpowered—75% of experiments cannot yet support confident decisions. The median adjusted lift of -50% reflects the high proportion of inconclusive cases rather than true negative performance. This suggests the testing infrastructure

Data Table

Experiment Verdicts

Per-experiment win/loss/neutral status with CTR lift and significance

experiment_idcontrol_impressionscontrol_clickscontrol_ctrcontrol_adj_ctrtreatment_impressionstreatment_clickstreatment_ctrtreatment_adj_ctrraw_lift_pctadjusted_lift_pctverdictcontent_typeimpression_bucketposition_bucketbatch_name_row
exp-00716930004110.02440.764400insufficientother500-20004-10all_experimentsexp-007
exp-009163900026200000insufficientother500-20004-10all_experimentsexp-009
exp-010156710.00060.045942210.00240.1261271.3174.7neutralother500-20004-10all_experimentsexp-010
exp-012130320.00150.3483282000-100-100insufficientother500-200011-20all_experimentsexp-012
exp-015133130.00230.3679176000-100-100insufficientother500-200011-20all_experimentsexp-015
exp-016108220.00180.403419530.01541.923732.3376.6winningother500-200011-20all_experimentsexp-016
exp-01791910.00110.158929110.00340.3765215.8136.9neutralother500-200011-20all_experimentsexp-017
exp-027287430.14986.0550000-100-100insufficientother100-5004-10all_experimentsexp-027
exp-0282080000000000insufficientother2000+11-20all_experimentsexp-028
exp-029325930.00090.30090000-100-100insufficientother2000+11-20all_experimentsexp-029
exp-030172720.00120.08740000-100-100insufficientother500-20004-10all_experimentsexp-030
exp-031278250.00180.1070000-100-100insufficientother2000+4-10all_experimentsexp-031
Interpretation

Purpose

This section identifies which experiments demonstrate statistically significant improvements (winning), declines (losing), or inconclusive results (neutral/insufficient). Of 12 experiments, only 3 have adequate statistical power to draw reliable conclusions, making this a critical filter for distinguishing real effects from noise in the testing portfolio.

Key Findings

  • Win Rate: 1 of 3 testable experiments (33.3%) shows positive, significant lift—indicating modest success in the overall testing program
  • Adjusted Lift %: The winning experiment (exp-016) demonstrates 376.65% position-adjusted CTR improvement, isolating the title treatment effect from ranking confounds
  • Insufficient Data: 75% of experiments lack statistical power, with median p-values of 0.9, reflecting low click volumes relative to variance
  • No Losses: Zero experiments show statistically significant negative effects, suggesting treatments are not harmful

Interpretation

The data reveals a portfolio heavily constrained by sample size rather than treatment quality. The single winning experiment shows substantial effect magnitude, but the high proportion of insufficient verdicts (9/12) indicates most experiments cannot yet distinguish signal from noise. This pattern suggests the testing infrastructure may be underpowered for the baseline click rates observed (mean control CTR = 0.01), requiring either longer run times or higher-traffic segments to achieve reliable conclusions.

Visualization

Batch Comparison

Win rate and average effect size comparison across hypothesis batches

Interpretation

Purpose

This section evaluates hypothesis batch performance by comparing win rates and average effect sizes across experiment groups. It identifies which hypothesis types (e.g., title framing, intent matching) are generating the strongest positive results, enabling prioritization of high-performing hypotheses for scaling and resource allocation.

Key Findings

  • Win Rate: 33.3% — Significantly exceeds the 15% industry benchmark, indicating above-average hypothesis quality across the program
  • Average Adjusted Lift: 376.65% — Represents the mean CTR improvement for winning experiments, demonstrating substantial effect sizes when experiments succeed
  • Testable Experiments: 3 of 12 — Only 25% of experiments achieved statistical significance; 75% remain insufficient, limiting batch-level conclusions
  • Zero Losses: No experiments showed statistically significant negative lift, reducing downside risk

Interpretation

The single batch ("all_experiments") demonstrates strong performance relative to industry standards, with one clear winner and two neutral results among testable experiments. However, the high proportion of insufficient experiments (9 of 12) suggests most hypotheses lack adequate sample size or effect magnitude for confident conclusions. The exceptional 376.65% average lift reflects the winning experiment's substantial impact, though this represents only one successful case within a larger portfolio of underpowered tests.

Context

This analysis treats all 12

Visualization

Success Patterns

Win rate segmented by content type, impression level, and position bucket

Interpretation

Purpose

This section identifies which page characteristics—traffic volume, ranking position, and content type—correlate with successful title experiments. By segmenting the 12 experiments across impression buckets and position ranges, the analysis reveals whether certain page types respond more favorably to title changes, enabling future experiments to be targeted at high-opportunity segments.

Key Findings

  • Position Bucket 11-20 Win Rate: 16.67% (1 win from 6 experiments) — the highest-performing segment, suggesting lower-ranked pages may be more responsive to title optimization
  • Impression Bucket 500-2000 Win Rate: 12.5% (1 win from 8 experiments) — moderate traffic pages show modest success, representing the largest tested segment
  • Position Bucket 4-10 Win Rate: 0% (0 wins from 6 experiments) — higher-ranked pages show no successful outcomes despite equal sample size
  • Overall Pattern: Win rates remain below 20% across all segments, indicating limited predictive power at current sample sizes

Interpretation

The data suggests a weak but directional pattern: pages ranking in positions 11-20 achieved the only position-based win, while mid-traffic pages (500-2000 impressions) showed marginal success. Conversely, higher-ranking pages (4-10) and very

Visualization

Effect Size Distribution

Distribution of position-adjusted CTR lifts across all experiments

Interpretation

Purpose

This section evaluates whether successful experiments deliver substantial impact or marginal gains. Understanding effect size distribution reveals the magnitude of wins relative to losses, helping assess whether the experimental portfolio is generating transformative improvements or incremental gains. This directly informs the value proposition of the testing program.

Key Findings

  • Median Effect Size: -50% — The typical experiment shows negative or zero lift, indicating most tests underperform control
  • Maximum Positive Lift: +376.6% — The single winning experiment (exp-016) demonstrates exceptionally large impact, far exceeding the "large win" threshold (>10%)
  • Effect Distribution: Highly skewed with extreme variance (SD=150.91%) — Results cluster at -100%, 0%, or +174-376%, showing no moderate wins
  • Winner vs. Loser Gap: Winners average +376.6% while losers average 0%, indicating a stark binary outcome pattern rather than a spectrum

Interpretation

The portfolio exhibits polarized results: one transformative win offset by predominantly neutral or negative experiments. The absence of moderate wins (5-10% range) suggests either hypothesis diversity with high variance or insufficient statistical power to detect smaller effects. The extreme positive outlier (exp-016) represents a genuine breakthrough, but 75% insufficient verdicts indicate most experiments lack conclusive evidence. This distribution reflects early-stage testing where

Data Table

ROI Projection

Projected click uplift if winning experiments are promoted to production

experiment_idadjusted_lift_pctmonthly_impressions_estimateprojected_monthly_click_uplift
exp-016376.612778.891
Interpretation

Purpose

This section quantifies the business impact of promoting winning experiments to production by projecting incremental click volume and associated revenue. It translates experimental lift percentages into actionable monthly and annual metrics, enabling stakeholders to understand the tangible value generated from the 1 winning experiment (exp-016) identified across the 3 testable experiments in this batch.

Key Findings

  • Projected Monthly Click Uplift: 8.89 clicks — derived from exp-016's 376.65% adjusted lift applied to 1,277 monthly impressions
  • Projected Annual Click Uplift: 106.69 clicks — annualized monthly projection showing sustained impact over 12 months
  • Estimated Monthly Revenue Value: $44.45 — calculated at $5 cost-per-click, representing direct traffic value from the winning variant
  • Win Rate Context: Only 1 of 3 testable experiments achieved statistical significance, limiting the overall uplift pool

Interpretation

The winning experiment demonstrates substantial lift (376.65%), but the modest absolute click gains (9 monthly) reflect the relatively small impression volume (1,277) and low baseline click rates observed across the batch. The $44 monthly value represents incremental revenue from a single high-performing variant. This projection assumes consistent traffic patterns and sustained treatment effect post-launch.

Context

Data Table

Data Sufficiency

Power analysis and recommendations for experiments needing more data

experiment_idcurrent_impressionspower_estimateimpressions_needed_80pctestimated_days_to_significancerecommended_action
exp-00717340.79346814Run 14 more days
exp-00919010.79380214Run 14 more days
exp-01215850.79317014Run 14 more days
exp-01515070.79301414Run 14 more days
exp-0272870.7957414Run 14 more days
exp-02820800.79416014Run 14 more days
exp-02932590.79651814Run 14 more days
exp-03017270.79345414Run 14 more days
exp-03127820.79556414Run 14 more days
Interpretation

Purpose

This section identifies which experiments lack sufficient statistical power to draw reliable conclusions. Nine of the twelve experiments currently operate at 79% power—below the 80% threshold needed for confident decision-making. Understanding data sufficiency is critical because premature conclusions from underpowered tests risk false negatives, while extending tests unnecessarily delays business decisions.

Key Findings

  • Current Power Estimate: 0.79 across all insufficient experiments—just below the 80% target threshold, indicating borderline adequacy for statistical inference
  • Impressions Needed: Average of 3,747 additional impressions required (range: 574–6,518), representing roughly 2x the current collection in most cases
  • Timeline to Significance: Uniform 14-day extension needed across all nine experiments, suggesting consistent traffic patterns and effect sizes
  • Sample Size Variation: Current impressions range from 287 to 3,259, with smaller experiments (exp-027) requiring proportionally less additional data

Interpretation

The consistent 14-day recommendation across heterogeneous sample sizes indicates the power calculation accounts for both current impressions and expected traffic velocity. These experiments sit at the margin of statistical reliability; additional data collection will push them above the 80% power threshold, enabling defensible conclusions about treatment effects. The uniform timeline suggests traffic distribution is stable across experiment conditions.

Context

Power

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing