Analysis overview and configuration

Configuration

Analysis TypeProportional Hazards

CompanyClinical Research Institute

ObjectiveIdentify patient factors that predict time to disease recurrence after treatment

Analysis Date2026-03-15

Processing Idcox_test_20260315_143721

Total Observations500

Module Parameters

Parameter	Value	_row
confidence_level	0.95	confidence_level
ties_method	efron	ties_method
group_col	group_col	group_col

Proportional Hazards analysis for Clinical Research Institute

Interpretation

Purpose

This Cox proportional hazards analysis identifies patient factors predicting time to disease recurrence after treatment. The analysis evaluates six predictors across 500 patients to quantify their individual effects on recurrence risk, enabling clinicians to stratify patients by prognosis and tailor follow-up strategies.

Key Findings

Event Rate: 93.4% (467 of 500 patients experienced recurrence) — exceptionally high event frequency provides strong statistical power for detecting predictor effects
Model Concordance: 0.654 — moderate discriminative ability; the model correctly orders recurrence timing 65.4% of the time, better than chance but with room for improvement
All Predictors Significant: All 6 predictors achieved statistical significance (p < 0.05), with 5 at p < 0.001 level
Strongest Effect: predictor_3late shows highest hazard ratio (HR=1.60, 95% CI: 1.33–1.93), indicating 60% increased recurrence risk
Protective Factor: group_colC demonstrates strongest protective effect (HR=0.44, 95% CI: 0.35–0.55), reducing recurrence hazard by 56%
Proportional Hazards Assumption: Mostly satisfied globally (p

Data preprocessing and column mapping

Data Quality

Initial Rows500

Final Rows500

Rows Removed0

Retention Rate100

Data Quality

Metric	Value
Initial Rows	500
Final Rows	500
Rows Removed	0
Retention Rate	100%

Processed 500 observations, retained 500 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data preprocessing pipeline for a survival analysis model with 500 observations. Perfect retention (100%) indicates no rows were removed during cleaning, suggesting either exceptionally clean source data or minimal data quality validation. Understanding preprocessing decisions is critical for assessing whether the model's strong performance (concordance: 0.654, all 6 predictors significant) reflects genuine predictive power or potential data quality issues masked by incomplete validation.

Key Findings

Retention Rate: 100% (500/500 rows retained) - No observations were excluded during preprocessing
Rows Removed: 0 - Complete absence of filtering, outlier removal, or missing value handling
Train/Test Split: Not documented - No explicit validation strategy is recorded in the preprocessing stage
Data Transformations: No transformations explicitly noted despite survival analysis typically requiring time-to-event and event indicator preparation

Interpretation

The perfect retention rate is unusual for real-world data and raises questions about preprocessing rigor. With 93.4% event rate and 467 events across 500 observations, the data appears complete but potentially unvalidated. The absence of documented train/test splits means model performance metrics (concordance, AIC) may reflect training set performance rather than generalization capability. This is particularly concerning given all 6 predictors achieved statistical significance—a pattern that could indicate ov

Key Metrics

concordance: 0.6536
n_events: 467
event_rate_pct: 93.4
n_significant: 6
n_predictors: 6

Summary

Bottom Line: Cox proportional hazards regression identified 6 significant predictors of time-to-event out of 6 total. The model achieves concordance of 0.6536 (Moderate discrimination).

Key Findings:
• Event rate: 93.4% (467 events in 500 subjects)
• Top significant predictors: predictor_1, predictor_2, predictor_3late
• Model AIC: 4870.64

Recommendation: Use hazard ratios to prioritize risk factors. Subjects with HR > 1 on significant covariates have elevated event risk and may warrant targeted intervention. Verify proportional hazards assumption holds before relying on estimates.

Interpretation

EXECUTIVE SUMMARY: COX PROPORTIONAL HAZARDS MODEL

Purpose

This analysis presents the results of a Cox proportional hazards regression model designed to identify and quantify risk factors associated with time-to-event outcomes. The model's performance and predictor significance directly inform risk stratification and intervention prioritization decisions.

Key Findings

Concordance (C-statistic): 0.654 – Indicates moderate discriminative ability; the model correctly orders event timing approximately 65% of the time, suggesting reasonable but not exceptional predictive power
Event Rate: 93.4% (467 of 500 observations) – Extremely high event prevalence indicates a mature cohort with substantial follow-up time
Significant Predictors: All 6 predictors achieved statistical significance (p < 0.05), with predictor_3late showing the strongest effect (HR = 1.60, 95% CI: 1.33–1.93)
Model Fit: AIC of 4870.64 with log-rank p-value = 0, confirming overall model significance
Proportional Hazards Assumption: Global test p-value = 0.11 (holds); however, predictor_3 violates the assumption (p = 0.02)

Interpretation

Overall model performance and discrimination statistics

metric	value
N Observations	500
N Events	467
Event Rate (%)	93.4
Concordance (C-statistic)	0.6536
Concordance SE	0.0133
Log-rank p-value	0
AIC	4870.64
N Predictors	6
N Significant	6

Interpretation

Purpose

This section evaluates the overall discriminative performance and statistical significance of the Cox proportional hazards model. It answers whether the model reliably distinguishes between subjects at different risk levels and whether the included predictors meaningfully explain survival variation in the dataset.

Key Findings

Concordance (C-statistic): 0.654 - Indicates moderate ability to rank subjects by risk; better than random (0.5) but below strong discrimination (>0.75)
All 6 Predictors Significant: 100% of included terms achieved p < 0.05, suggesting genuine associations with survival outcomes
Log-rank p-value: 0 - Highly significant group differences, confirming the model captures meaningful stratification
Event Rate: 93.4% (467/500 events) - High event prevalence supports robust model estimation with sufficient outcome variation

Interpretation

The model demonstrates moderate but clinically meaningful discriminative ability. The universal significance of all six predictors—combined with a zero log-rank p-value—indicates the model successfully identifies survival-relevant factors and stratifies the cohort into meaningfully different risk groups. The high event rate ensures adequate statistical power for parameter estimation without sparse-data bias.

Context

Concordance of 0.65 is typical for survival models in observational data; clinical utility depends

Hazard ratios and 95% confidence intervals for all predictors

Interpretation

Purpose

This section presents the Cox proportional hazards model results, quantifying how each of the 6 predictors influences event risk. The forest plot visualizes hazard ratios with 95% confidence intervals, allowing rapid assessment of effect direction and statistical significance. All predictors achieved significance (p < 0.05), indicating robust associations with the outcome in this 500-observation cohort with 93.4% event rate.

Key Findings

Predictor_3late: HR = 1.60 (95% CI: 1.33–1.93) — strongest risk factor; 60% increased hazard per unit increase
Group_colC: HR = 0.44 (95% CI: 0.35–0.55) — strongest protective effect; 56% hazard reduction versus reference
Predictor_1: HR = 1.02 (95% CI: 1.01–1.03) — smallest but highly significant effect (z = 5.17)
Confidence Interval Width: Ranges 0.11–0.33, reflecting varying precision; narrower intervals indicate more stable estimates

Interpretation

All six predictors demonstrate statistically significant associations with event hazard. Risk factors (HR > 1) include predictor_1, predictor_2, and predictor

Full coefficient table with hazard ratios, confidence intervals, and p-values

term	coef	hr	se	z_score	p_value	hr_lower	hr_upper	significance
patient_age	0.0204	1.021	0.0039	5.174	0	1.013	1.029	***
biomarker_level	0.1073	1.113	0.0227	4.719	0	1.065	1.164	***
predictor_3late	0.473	1.605	0.095	4.98	0	1.332	1.933	***
predictor_4male	0.1952	1.216	0.0943	2.069	0.0385	1.01	1.462	*
group_colB	-0.5086	0.6013	0.1141	-4.459	0	0.4809	0.752	***
group_colC	-0.8298	0.4361	0.1181	-7.025	0	0.346	0.5498	***

Semantic	Actual
days_observed	days_observed
event_status	event_status
treatment_group	treatment_group
patient_age	patient_age
biomarker_level	biomarker_level
disease_stage	disease_stage
patient_gender	patient_gender

Predictor	HR	CI_Lower	CI_Upper	P_Value	Significance
patient_age	1.021	1.013	1.029	0.0000	***
biomarker_level	1.113	1.065	1.164	0.0000	***
predictor_3late	1.605	1.332	1.933	0.0000	***
predictor_4male	1.216	1.01	1.462	0.0385	*
group_colB	0.601	0.481	0.752	0.0000	***
group_colC	0.436	0.346	0.55	0.0000	***

Interpretation

Purpose

This section presents the Cox proportional hazards regression coefficients for all 6 predictors in the survival model. It quantifies how each predictor affects the instantaneous risk of the event (hazard), enabling identification of protective and risk-elevating factors while accounting for censoring and competing risks in the 500-observation cohort.

Key Findings

predictor_3late: HR=1.60 (95% CI: 1.33–1.93, p<0.001) – Strongest risk factor; increases hazard by 60%
group_colC: HR=0.44 (95% CI: 0.35–0.55, p<0.001) – Strongest protective effect; reduces hazard by 56%
All 6 predictors significant: Five at p<0.001 (*), one at p=0.04 (*); all 95% CIs exclude 1.0
Hazard ratio range: 0.44–1.60, indicating substantial effect heterogeneity across predictors

Interpretation

The model identifies three protective factors (group_colB, group_colC, and baseline reference) and three risk elevators (predictor_1, predictor_2, predictor_3late). The high event rate (93.4

Survival probability curves over time, stratified by group

Interpretation

Purpose

This section visualizes Kaplan-Meier survival curves for three distinct groups, showing how the probability of survival changes over time (0–1,491 days). Wider separation between curves indicates stronger group differences in survival outcomes. The 95% confidence intervals (shaded bands) quantify uncertainty around each estimate, enabling assessment of whether observed differences are statistically meaningful.

Key Findings

Time Range: Observations span 1 to 1,491 days (median 151 days), with right-skewed distribution indicating longer follow-up tails
Survival Decline: Mean survival probability decreases from 0.98 at early timepoints to 0.04–0.05 at late timepoints, reflecting cumulative event occurrence
Group Distribution: Groups A, B, and C are nearly balanced (129–138 observations each), enabling fair comparison
Confidence Intervals: Narrow at early times, widening substantially at later timepoints due to reduced sample size from censoring/events

Interpretation

The curves demonstrate substantial group stratification in survival outcomes. Early separation suggests groups experience markedly different hazard rates. The log-rank p-value of 0 (from overall metrics) confirms these differences are statistically significant. This aligns with the Cox model's concordance of 0.654, indicating

Cumulative hazard over time by group

Interpretation

Purpose

The cumulative hazard plot visualizes accumulated risk over time using the Nelson-Aalen estimator, enabling assessment of whether the exponential distribution assumption holds and whether the proportional hazards assumption is satisfied across groups. This diagnostic is critical for validating the Cox proportional hazards model used in the overall survival analysis.

Key Findings

Time Range: Events tracked from 1 to 1,491 days (mean=213 days), capturing the full follow-up period with right-skewed distribution
Cumulative Hazard Range: 0.01 to 23.03 across groups, with Group C showing substantially higher accumulated risk (3.32 at endpoint) compared to earlier timepoints
Group Distribution: Balanced representation across three groups (A: 129, B: 138, C: 134 observations), enabling fair cross-group comparison
Hazard Accumulation Pattern: Curves show non-linear acceleration, particularly for Group C, suggesting increasing hazard rates over time rather than constant exponential hazard

Interpretation

The cumulative hazard curves do not appear strictly linear, indicating the exponential distribution assumption may not hold perfectly. Group C demonstrates markedly elevated cumulative hazard relative to Groups A and B, consistent with the Cox model results showing group_colC has the lowest hazard ratio (HR=

Proportional hazards assumption test using Schoenfeld residuals

Interpretation

Purpose

This section validates a core assumption of the Cox proportional hazards model: that hazard ratios remain constant over time. The test uses Schoenfeld residuals to detect whether any predictor's effect changes as follow-up time increases. Violations suggest that a predictor's impact on survival is time-dependent rather than constant, which affects the validity of the reported hazard ratios.

Key Findings

Number of Violations: 1 predictor violates the proportional hazards assumption (p < 0.05)
Predictor_3: Identified as the violating variable with p-value = 0.02, indicating its hazard ratio is not constant over time
Majority Compliance: 5 of 6 predictors satisfy the PH assumption (p-values range from 0.12 to 0.89)
Global Test: The overall model passes (p = 0.11), suggesting the violation is isolated and does not invalidate the entire model

Interpretation

The Cox model assumes constant hazard ratios, but predictor_3 shows evidence of time-varying effects—its impact on survival probability changes as follow-up time progresses. This is particularly notable given predictor_3's strong effect (HR = 1.60, p < 0.001) in the main model.

Interpretation guide for hazard ratios and model output

metric	value
N Observations	500
N Events	467
Event Rate (%)	93.4
Concordance (C-statistic)	0.6536
Concordance SE	0.0133
Log-rank p-value	0
AIC	4870.64
N Predictors	6
N Significant	6

Interpretation

Purpose

This section interprets the Cox proportional hazards model results, translating hazard ratios into clinically or operationally meaningful risk changes. It explains how each predictor affects instantaneous event hazard and validates overall model discrimination ability, enabling stakeholders to understand which factors most strongly influence survival outcomes.

Key Findings

Most Protective Predictor: group_colC (HR = 0.44) reduces hazard by 56.4%, indicating substantially lower event risk in this group
Highest Risk Predictor: predictor_3late (HR = 1.61) increases hazard by 60.5%, representing the strongest adverse effect among all predictors
Model Discrimination: Concordance = 0.654 indicates moderate predictive accuracy—substantially better than random (0.5) but with room for improvement toward perfect (1.0)
Statistical Strength: All 6 predictors are significant (p < 0.05); 467 events provide robust evidence

Interpretation

The model demonstrates that group membership and late presentation status are primary drivers of event risk. The 0.654 concordance suggests the model correctly ranks risk pairs approximately 65% of the time, reflecting meaningful but incomplete discrimination. The tight clustering of all predictors around significance thresholds indicates consistent, reliable effects across the covariate set.

###

Analysis Overview

Configuration

Module Parameters

Interpretation

Purpose

Key Findings

Data Preprocessing

Data Quality

Data Quality

Interpretation

Purpose

Key Findings

Interpretation