Overview

Correlation Analysis

Variable Relationship Explorer

Analysis overview and configuration

Configuration

Analysis TypeCorrelation
CompanyDemo Corp
ObjectiveExplore correlations between HR metrics to guide modeling decisions
Analysis Date2026-03-14
Processing Idanalytics__statistical__exploratory__correlation_test_20260314_213324
Total Observations300

Module Parameters

ParameterValue_row
methodpearsonmethod
significance_level0.05significance_level
top_n_pairs15top_n_pairs
Correlation analysis for Demo Corp

Interpretation

Purpose

This correlation analysis examines relationships among 7 HR variables across 300 employees to identify which factors move together and may inform predictive modeling. Understanding these interdependencies helps prioritize which variables are most relevant for downstream analysis and reveals the underlying structure of HR metrics at Demo Corp.

Key Findings

  • Strongest Correlation: Age vs YearsAtCompany (r=0.88, p<0.001) - A very strong positive relationship indicating older employees tend to have longer tenure
  • Significant Pairs: 8 of 21 possible pairs (38.1%) show statistically significant relationships at p<0.05
  • Job Satisfaction Link: JobSatisfaction vs PerformanceRating (r=0.59) is the second-strongest relationship, suggesting employee satisfaction correlates meaningfully with performance outcomes
  • Weak Overall Pattern: 81% of correlations are classified as weak (r<0.5), indicating most HR variables operate relatively independently
  • Data Completeness: 300 observations analyzed with minimal missing data (0-15 values per variable)

Interpretation

The analysis reveals a sparse correlation landscape where most HR metrics don't strongly predict each other. The dominant Age-Tenure relationship is intuitive and expected. The moderate Job Satisfaction-Performance link is noteworthy for modeling, while variables like Distance

Data Preparation

Data Preprocessing

Data Quality & Completeness

Data preprocessing and column mapping

Data Quality

Initial Rows300
Final Rows300
Rows Removed0
Retention Rate100

Data Quality

MetricValue
Initial Rows300
Final Rows300
Rows Removed0
Retention Rate100%
Processed 300 observations, retained 300 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data preprocessing pipeline for the correlation analysis of 7 variables across 300 observations. It demonstrates data integrity and completeness prior to statistical testing, which is critical for ensuring the reliability of the 8 significant correlations (38.1% of 21 pairs) identified in the analysis.

Key Findings

  • Retention Rate: 100% (300/300 rows) - No observations were removed during preprocessing, indicating clean input data with minimal quality issues
  • Rows Removed: 0 - The dataset required no filtering, deletion, or exclusion steps
  • Train/Test Split: Not applied - The full dataset was used for correlation analysis rather than predictive modeling
  • Data Completeness: Variable-level missing data exists (15 missing values in MonthlySalary, JobSatisfaction, TrainingHours) but did not trigger row-level removal

Interpretation

The 100% retention rate reflects a well-curated dataset entering the analysis phase. However, the presence of 15 missing observations in three variables (noted in variable_summary_data) suggests selective missingness rather than systematic data loss. This explains why some correlation pairs have n_obs=285 while others have n_obs=300. The absence of train/test splitting confirms this is descriptive correlation analysis rather than predictive modeling, appropriate for the stated

Executive Summary

Executive Summary

Key Correlation Findings

Key Metrics

n_variables
7
n_significant_pairs
8
pct_significant
38.1
strongest_r
0.8831
observations_analyzed
300

Key Findings

FindingValue
Variables Analyzed7
Total Pairs21
Significant Pairs8 (38.1%)
Strongest CorrelationAge vs YearsAtCompany (r=0.8831)
Correlation MethodPearson
Observations Used300

Summary

Bottom Line: Pearson correlation analysis of 7 variables found 8 significant relationships out of 21 pairs tested (38.1%).

Key Findings:
• Strongest pair: Age vs YearsAtCompany (r = 0.8831)
• 8 of 21 pairs are statistically significant (p < 0.05)
• 300 observations analyzed

Recommendation: Use the correlation matrix to identify variable clusters. Pairs with |r| > 0.7 may indicate multicollinearity — consider removing one variable from predictive models. Always inspect scatter plots for the strongest pairs to confirm linear relationships before drawing conclusions.

Interpretation

Purpose

This analysis examines relationships among 7 organizational variables across 300 employees using Pearson correlation. The objective is to identify which variable pairs have statistically significant associations, enabling data-driven decisions about workforce dynamics, compensation structures, and performance drivers.

Key Findings

  • Strongest Correlation: Age vs YearsAtCompany (r = 0.883) - Nearly perfect positive relationship indicating tenure strongly tracks with employee age
  • Significant Pairs: 8 of 21 possible pairs (38.1%) show statistical significance at p < 0.05
  • Secondary Strong Relationships: JobSatisfaction vs PerformanceRating (r = 0.59) and Age vs MonthlySalary (r = 0.58) demonstrate moderate-to-strong positive associations
  • Weak Correlations Dominate: 81% of all pairs show weak strength, suggesting variables operate largely independently
  • Data Quality: Complete observations for 300 employees with minimal missing values (0-15 per variable)

Interpretation

The analysis reveals that organizational outcomes are driven by multiple independent factors rather than a few dominant relationships. Age-tenure alignment is expected and natural. The moderate job satisfaction-performance link suggests employee engagement meaningfully correlates with output, though other unmeasured factors likely drive performance. The weak correlations across most pairs indicate

Figure 4

Correlation Matrix

Pairwise Variable Relationships

Pairwise correlation matrix showing all variable relationships

Interpretation

Purpose

This correlation matrix maps all pairwise relationships among 7 variables across 300 observations, identifying which variables move together systematically. It serves as a foundational diagnostic tool to detect potential dependencies and multicollinearity patterns that inform downstream modeling and variable selection decisions.

Key Findings

  • Significant Pairs: 8 of 21 pairs (38.1%) show statistically significant correlations (p < 0.05), indicating moderate evidence of true relationships beyond random noise
  • Strongest Correlation: Age vs YearsAtCompany (r = 0.88) demonstrates a very strong positive relationship, suggesting tenure increases predictably with employee age
  • Correlation Range: Values span from -0.03 to 1.0 (mean = 0.29), with most non-diagonal pairs clustering near weak-to-moderate strength
  • Pattern: Positive correlations dominate (86.7% of significant pairs), with DistanceFromHome showing near-zero relationships across all variables

Interpretation

The matrix reveals a workforce where age and tenure are tightly coupled, while commute distance operates independently of other measured factors. Job satisfaction and performance show moderate positive association (r = 0.59), suggesting employee engagement correlates with output quality. However, 62% of variable pairs lack statistical significance, indicating limited multicollinearity concerns and relatively independent

Figure 5

Top Correlations

Strongest Variable Relationships

Top variable pairs ranked by correlation strength

Interpretation

Purpose

This section identifies and ranks the strongest variable relationships in your dataset to reveal which factors move together most consistently. Understanding these correlations is essential for feature selection in predictive modeling and for identifying potential multicollinearity that could affect model performance or interpretation.

Key Findings

  • Strongest Pair (Age vs YearsAtCompany): r = 0.883 - An exceptionally strong positive relationship, indicating tenure increases predictably with employee age
  • Strong Pairs Count: 3 pairs qualify as strong (|r| ≥ 0.5), all statistically significant at p < 0.05
  • Moderate Relationships: 1 pair (MonthlySalary vs YearsAtCompany, r = 0.48) shows moderate strength
  • Dominant Pattern: 86.7% of top correlations are positive, suggesting aligned directional movement across most variables

Interpretation

The Age-YearsAtCompany relationship (r = 0.883) is exceptionally strong and highly significant, reflecting a natural organizational pattern where older employees tend to have longer tenure. The three strong pairs (Age, MonthlySalary, and JobSatisfaction relationships) suggest these variables share substantial common variance. However, the median correlation across all 15 top pairs is only 0.12, indicating most relationships are weak

Figure 6

Strongest Pair Scatter

Visual Relationship Check

Scatter plot of strongest pair: Age vs YearsAtCompany

Interpretation

Purpose

This scatter plot visualizes the strongest relationship identified in the correlation analysis: Age vs YearsAtCompany (r = 0.883). By displaying all 300 individual observations, it allows visual confirmation that the strong numerical correlation reflects a genuine linear pattern rather than statistical artifact or clustering effects. This section bridges summary statistics and raw data to validate the correlation's practical meaning.

Key Findings

  • Correlation Coefficient (r = 0.883): Indicates a very strong positive linear relationship—among the 21 variable pairs analyzed, this is the strongest association found
  • Sample Size (n = 300): Full dataset with no missing values, providing robust statistical power and confidence in the relationship's stability
  • Data Range: Age spans 22–65 years (mean 38.0); YearsAtCompany spans 0–18 years (mean 6.3), showing realistic organizational tenure patterns
  • Linear Pattern: Points cluster tightly around the trend line with minimal scatter, confirming the relationship is genuinely linear and not curved or segmented

Interpretation

The scatter plot demonstrates that older employees consistently have longer tenure at the company. This strong association (r = 0.883) suggests age and organizational longevity are nearly interchangeable predictors in this dataset. The tight clustering around the trend line indicates minimal unexplained variance, meaning age

Figure 7

Variable Distributions

Standardized Box Plots

Standardized distributions of all analyzed variables

Interpretation

Purpose

This section visualizes the standardized distributions of all 7 variables to enable direct comparison across different measurement scales. By converting raw values to z-scores, variables with vastly different units (e.g., age in years vs. salary in dollars) can be assessed side-by-side for spread, symmetry, and outlier presence. Understanding distribution shape is critical because extreme outliers or skewness can inflate or deflate correlation coefficients, affecting the reliability of the 8 significant relationships identified in the overall analysis.

Key Findings

  • Z-score Range: -3.31 to 4.43 - Indicates presence of moderate outliers across the dataset, with some observations extending 3+ standard deviations from the mean
  • Overall Skewness: 0.26 - Slight positive skew suggests most variables cluster toward lower values with right-tail extensions
  • Raw Value Spread: min=0, max=8,103 - Extreme range reflects heterogeneous variable scales (e.g., salary vs. distance)
  • Median Offset: -0.09 z-score median vs. 0 mean - Subtle left-skew in standardized space indicates slight concentration below average

Interpretation

The standardized distributions reveal that while most variables are reasonably symmetric, the presence of outliers (z-scores beyond

Table 8

Correlation Table

Full Pairwise Statistics

Complete pairwise correlation results with statistical details

var1var2r_valuep_valuen_obssignificantstrength_row
AgeYearsAtCompany0.88310300TrueStrongindependent_11
JobSatisfactionPerformanceRating0.59010285TrueStrongJobSatisfaction
AgeMonthlySalary0.58030285TrueStrongAge
MonthlySalaryYearsAtCompany0.47880285TrueModerateMonthlySalary
PerformanceRatingTrainingHours0.24570285TrueWeakPerformanceRating
MonthlySalaryJobSatisfaction0.15930.0086271TrueWeakindependent_21
JobSatisfactionTrainingHours0.140.0212271TrueWeakindependent_41
AgeJobSatisfaction0.11920.0444285TrueWeakindependent_12
YearsAtCompanyJobSatisfaction0.11490.0528285FalseWeakYearsAtCompany
AgePerformanceRating0.07880.1733300FalseWeakindependent_13
YearsAtCompanyPerformanceRating0.07330.2057300FalseWeakindependent_31
MonthlySalaryDistanceFromHome0.04240.4763285FalseWeakindependent_24
AgeTrainingHours-0.02850.6319285FalseWeakindependent_14
YearsAtCompanyTrainingHours-0.02660.6544285FalseWeakindependent_32
MonthlySalaryPerformanceRating0.0230.6989285FalseWeakindependent_22
YearsAtCompanyDistanceFromHome0.02180.7073300FalseWeakindependent_33
MonthlySalaryTrainingHours0.020.7431271FalseWeakindependent_23
TrainingHoursDistanceFromHome-0.01440.8093285FalseWeakTrainingHours
AgeDistanceFromHome-0.01280.8247300FalseWeakindependent_15
JobSatisfactionDistanceFromHome0.0070.9063285FalseWeakindependent_42
PerformanceRatingDistanceFromHome-2.00e-040.9975300FalseWeakindependent_51

Interpretation

Purpose

This section presents all 21 pairwise correlations among 7 variables, identifying which relationships are statistically significant at the 0.05 level. It serves as the comprehensive foundation for understanding variable interdependencies across the dataset, enabling prioritization of relationships worthy of deeper investigation.

Key Findings

  • Significant Pairs: 8 of 21 relationships (38.1%) meet statistical significance, indicating moderate evidence of real associations beyond random variation
  • Strongest Correlation: Age vs YearsAtCompany (r=0.88, p≈0) demonstrates the dominant relationship in the dataset
  • Strength Distribution: 81% of pairs are classified as weak (r<0.3), with only 3 strong and 1 moderate relationship, reflecting sparse meaningful associations
  • Sample Consistency: Observations range 271–300 across pairs, with most analyses using 285 observations, suggesting minimal data loss

Interpretation

The correlation matrix reveals a dataset where most variables operate independently. The three strong relationships—Age with tenure and salary, plus job satisfaction with performance—represent the primary drivers of covariation. The predominance of weak, non-significant pairs (13 of 21) indicates that employee outcomes are not heavily determined by simple linear associations among these seven variables, suggesting either complex multivariate interactions or the influence of unmeasured factors

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing