Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| method | pearson | method |
| significance_level | 0.05 | significance_level |
| top_n_pairs | 15 | top_n_pairs |
This correlation analysis examines relationships among 7 HR variables across 300 employees to identify which factors move together and may inform predictive modeling. Understanding these interdependencies helps prioritize which variables are most relevant for downstream analysis and reveals the underlying structure of HR metrics at Demo Corp.
The analysis reveals a sparse correlation landscape where most HR metrics don't strongly predict each other. The dominant Age-Tenure relationship is intuitive and expected. The moderate Job Satisfaction-Performance link is noteworthy for modeling, while variables like Distance
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 300 |
| Final Rows | 300 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data preprocessing pipeline for the correlation analysis of 7 variables across 300 observations. It demonstrates data integrity and completeness prior to statistical testing, which is critical for ensuring the reliability of the 8 significant correlations (38.1% of 21 pairs) identified in the analysis.
The 100% retention rate reflects a well-curated dataset entering the analysis phase. However, the presence of 15 missing observations in three variables (noted in variable_summary_data) suggests selective missingness rather than systematic data loss. This explains why some correlation pairs have n_obs=285 while others have n_obs=300. The absence of train/test splitting confirms this is descriptive correlation analysis rather than predictive modeling, appropriate for the stated
| Finding | Value |
|---|---|
| Variables Analyzed | 7 |
| Total Pairs | 21 |
| Significant Pairs | 8 (38.1%) |
| Strongest Correlation | Age vs YearsAtCompany (r=0.8831) |
| Correlation Method | Pearson |
| Observations Used | 300 |
This analysis examines relationships among 7 organizational variables across 300 employees using Pearson correlation. The objective is to identify which variable pairs have statistically significant associations, enabling data-driven decisions about workforce dynamics, compensation structures, and performance drivers.
The analysis reveals that organizational outcomes are driven by multiple independent factors rather than a few dominant relationships. Age-tenure alignment is expected and natural. The moderate job satisfaction-performance link suggests employee engagement meaningfully correlates with output, though other unmeasured factors likely drive performance. The weak correlations across most pairs indicate
Pairwise correlation matrix showing all variable relationships
This correlation matrix maps all pairwise relationships among 7 variables across 300 observations, identifying which variables move together systematically. It serves as a foundational diagnostic tool to detect potential dependencies and multicollinearity patterns that inform downstream modeling and variable selection decisions.
The matrix reveals a workforce where age and tenure are tightly coupled, while commute distance operates independently of other measured factors. Job satisfaction and performance show moderate positive association (r = 0.59), suggesting employee engagement correlates with output quality. However, 62% of variable pairs lack statistical significance, indicating limited multicollinearity concerns and relatively independent
Top variable pairs ranked by correlation strength
This section identifies and ranks the strongest variable relationships in your dataset to reveal which factors move together most consistently. Understanding these correlations is essential for feature selection in predictive modeling and for identifying potential multicollinearity that could affect model performance or interpretation.
The Age-YearsAtCompany relationship (r = 0.883) is exceptionally strong and highly significant, reflecting a natural organizational pattern where older employees tend to have longer tenure. The three strong pairs (Age, MonthlySalary, and JobSatisfaction relationships) suggest these variables share substantial common variance. However, the median correlation across all 15 top pairs is only 0.12, indicating most relationships are weak
Scatter plot of strongest pair: Age vs YearsAtCompany
This scatter plot visualizes the strongest relationship identified in the correlation analysis: Age vs YearsAtCompany (r = 0.883). By displaying all 300 individual observations, it allows visual confirmation that the strong numerical correlation reflects a genuine linear pattern rather than statistical artifact or clustering effects. This section bridges summary statistics and raw data to validate the correlation's practical meaning.
The scatter plot demonstrates that older employees consistently have longer tenure at the company. This strong association (r = 0.883) suggests age and organizational longevity are nearly interchangeable predictors in this dataset. The tight clustering around the trend line indicates minimal unexplained variance, meaning age
Standardized distributions of all analyzed variables
This section visualizes the standardized distributions of all 7 variables to enable direct comparison across different measurement scales. By converting raw values to z-scores, variables with vastly different units (e.g., age in years vs. salary in dollars) can be assessed side-by-side for spread, symmetry, and outlier presence. Understanding distribution shape is critical because extreme outliers or skewness can inflate or deflate correlation coefficients, affecting the reliability of the 8 significant relationships identified in the overall analysis.
The standardized distributions reveal that while most variables are reasonably symmetric, the presence of outliers (z-scores beyond
Complete pairwise correlation results with statistical details
| var1 | var2 | r_value | p_value | n_obs | significant | strength | _row |
|---|---|---|---|---|---|---|---|
| Age | YearsAtCompany | 0.8831 | 0 | 300 | True | Strong | independent_11 |
| JobSatisfaction | PerformanceRating | 0.5901 | 0 | 285 | True | Strong | JobSatisfaction |
| Age | MonthlySalary | 0.5803 | 0 | 285 | True | Strong | Age |
| MonthlySalary | YearsAtCompany | 0.4788 | 0 | 285 | True | Moderate | MonthlySalary |
| PerformanceRating | TrainingHours | 0.2457 | 0 | 285 | True | Weak | PerformanceRating |
| MonthlySalary | JobSatisfaction | 0.1593 | 0.0086 | 271 | True | Weak | independent_21 |
| JobSatisfaction | TrainingHours | 0.14 | 0.0212 | 271 | True | Weak | independent_41 |
| Age | JobSatisfaction | 0.1192 | 0.0444 | 285 | True | Weak | independent_12 |
| YearsAtCompany | JobSatisfaction | 0.1149 | 0.0528 | 285 | False | Weak | YearsAtCompany |
| Age | PerformanceRating | 0.0788 | 0.1733 | 300 | False | Weak | independent_13 |
| YearsAtCompany | PerformanceRating | 0.0733 | 0.2057 | 300 | False | Weak | independent_31 |
| MonthlySalary | DistanceFromHome | 0.0424 | 0.4763 | 285 | False | Weak | independent_24 |
| Age | TrainingHours | -0.0285 | 0.6319 | 285 | False | Weak | independent_14 |
| YearsAtCompany | TrainingHours | -0.0266 | 0.6544 | 285 | False | Weak | independent_32 |
| MonthlySalary | PerformanceRating | 0.023 | 0.6989 | 285 | False | Weak | independent_22 |
| YearsAtCompany | DistanceFromHome | 0.0218 | 0.7073 | 300 | False | Weak | independent_33 |
| MonthlySalary | TrainingHours | 0.02 | 0.7431 | 271 | False | Weak | independent_23 |
| TrainingHours | DistanceFromHome | -0.0144 | 0.8093 | 285 | False | Weak | TrainingHours |
| Age | DistanceFromHome | -0.0128 | 0.8247 | 300 | False | Weak | independent_15 |
| JobSatisfaction | DistanceFromHome | 0.007 | 0.9063 | 285 | False | Weak | independent_42 |
| PerformanceRating | DistanceFromHome | -2.00e-04 | 0.9975 | 300 | False | Weak | independent_51 |
This section presents all 21 pairwise correlations among 7 variables, identifying which relationships are statistically significant at the 0.05 level. It serves as the comprehensive foundation for understanding variable interdependencies across the dataset, enabling prioritization of relationships worthy of deeper investigation.
The correlation matrix reveals a dataset where most variables operate independently. The three strong relationships—Age with tenure and salary, plus job satisfaction with performance—represent the primary drivers of covariation. The predominance of weak, non-significant pairs (13 of 21) indicates that employee outcomes are not heavily determined by simple linear associations among these seven variables, suggesting either complex multivariate interactions or the influence of unmeasured factors