Analysis overview and configuration

Configuration

Analysis TypeWorkforce Segmentation

CompanyHR Analytics Demo

ObjectiveSegment employees into natural groups based on tenure, satisfaction, compensation, and performance

Analysis Date2026-03-28

Processing Idtest_1774721626

Total Observations1470

Module Parameters

Parameter	Value	_row
n_clusters	4	n_clusters
scale_data	TRUE	scale_data
max_k	6	max_k

Workforce Segmentation analysis for HR Analytics Demo

Interpretation

Headline

The analysis identified 4 employee clusters, but silhouette score of 0.29 indicates weak cluster separation—consider using 3 clusters instead for clearer, more actionable segments.

Purpose

This workforce segmentation analysis applied K-means clustering to 1,470 employees across 5 features (age, income, job satisfaction, tenure, performance rating) to identify natural employee groups for targeted HR strategies. The analysis evaluated cluster quality and generated employee-to-cluster assignments with departmental breakdowns.

Key Findings

Silhouette Score: 0.29 (weak structure) — falls in the 0.25–0.5 range indicating poor cluster cohesion. The optimal K=3 achieved 0.36, substantially better than the final K=4 at 0.29.
Cluster Imbalance: Cluster 2 dominates at 43.5% (640 employees), while Cluster 1 represents only 14.1% (208 employees). This skew suggests unequal natural groupings.
Flight Risk Cluster: Cluster 4 (393 employees, 26.7%) flagged as high-risk, characterized by lowest job satisfaction (1.48 vs. 2.6 average) and lowest performance rating (3.0 vs. 3.27 average).
Variance Explained: PC1 and PC2 together explain 57.8% of total variance, leaving 42.2% unexplained—moderate dimensionality reduction effectiveness.
Data Quality: All 1,470 rows retained; no missing values or outliers removed.

Interpretation

The weak silhouette score (0.29) signals that clusters overlap substantially and lack clear boundaries. Employees within clusters are not distinctly similar to each other relative to other clusters. The algorithm's recommendation of K=3 (silhouette 0.36) suggests the data contains three natural groupings, not four. Cluster 4's low satisfaction and performance profile is actionable but represents only 27% of the workforce, limiting its strategic impact relative to the dominant Cluster 2.

Context

K-means assumes spherical clusters of similar size; the 3:1 size ratio between Cluster 2 and Cluster 1 violates this assumption. The weak silhouette score may reflect genuine workforce heterogeneity or indicate that the five features do not cleanly separate employees into distinct personas. Departmental distribution (R&D dominates all clusters at 62–71%) suggests department is not a primary differentiator.

Data preprocessing and column mapping

Data Quality

Initial Rows1470

Final Rows1470

Rows Removed0

Retention Rate100

Data Quality

Metric	Value
Initial Rows	1,470
Final Rows	1,470
Rows Removed	0
Retention Rate	100%

Processed 1,470 observations, retained 1,470 (100.0%) after cleaning

Interpretation

Headline

All 1,470 employee records passed quality checks with zero rows removed, ensuring a complete dataset for clustering analysis.

Purpose

Data preprocessing is the foundation of any statistical analysis. This section documents how the raw dataset was cleaned, validated, and prepared for the clustering model. A 100% retention rate indicates no missing values, duplicates, or outliers were flagged for removal—a clean starting point that strengthens confidence in downstream results.

Key Findings

Retention Rate: 100% (1,470 of 1,470 rows retained) - No data loss during cleaning
Rows Removed: 0 - No records excluded due to missing values, duplicates, or quality issues
Dataset Completeness: All 1,470 employees included in clustering analysis with no gaps

Interpretation

The dataset entered the analysis in excellent condition. No rows were dropped for missing data, outliers, or data quality issues, meaning the full workforce population is represented in the four clusters. This complete retention is ideal for workforce segmentation—every employee has been assigned to a cluster, avoiding selection bias that could skew HR insights. The absence of preprocessing exclusions also means the cluster profiles reflect the true composition of the organization without artificial filtering.

Context

While 100% retention is positive, note that the data quality report does not detail missing value patterns, outlier detection methods, or standardization/scaling procedures applied to the five clustering features. The clustering analysis itself (silhouette score of 0.293) suggests weak cluster separation, which may reflect genuine workforce diversity rather than data quality issues.

Key Metrics

total_employees: 1470
n_clusters: 4
avg_silhouette_score: 0.2927
flight_risk_cluster: Cluster 4

Key Findings

Finding	Value
Workforce Segments Discovered	4 distinct segments
Total Employees Analyzed	1,470
Cluster Quality (Silhouette)	0.2927 (weak)
Potential Flight-Risk Segment	Cluster 4
Largest Segment Share	43.5% of workforce
Smallest Segment Share	14.1% of workforce
PCA Variance Explained (2D)	57.8%

Summary

Bottom Line: K-means clustering discovered 4 natural workforce segments from 1,470 employees. Cluster quality (silhouette=0.2927) is weak.

Key Findings:
• 4 distinct employee segments identified based on age, income, satisfaction, tenure, and performance
• Flight-risk alert: Cluster 4 shows the highest performance-to-satisfaction ratio — high performers with low satisfaction who are at risk of leaving
• Segment sizes range from 14.1% to 43.5% of the workforce
• PCA captures 57.8% of variance in 2D for visual cluster separation

Recommended Actions:
1. Review cluster centroid heatmap to assign HR-meaningful labels to each segment
2. Prioritize Cluster 4 for immediate retention interviews and compensation benchmarking
3. Analyze Department composition to identify which teams are over-represented in at-risk clusters
4. Design segment-specific HR programs: promotion paths, compensation reviews, coaching, onboarding improvements

Interpretation

Headline

Four distinct workforce segments identified, but weak cluster quality (silhouette=0.29) limits confidence; Cluster 4 emerges as a critical flight-risk group of 393 high performers (26.7% of workforce) with dangerously low job satisfaction (1.48/5).

Purpose

This analysis segments your 1,470-person workforce into four natural groups using five employee attributes (age, income, satisfaction, tenure, performance). The goal is to identify which employee populations require targeted HR interventions—particularly those at risk of departure. The weak silhouette score signals that clusters overlap considerably, meaning segment boundaries are fuzzy rather than crisp.

Key Findings

Cluster 4 (Flight Risk): 393 employees, lowest job satisfaction (1.48/5), average performance rating of 3.0, average tenure 5.4 years, monthly income $4,880. This is your highest-risk attrition segment.
Cluster 2 (Largest Segment): 640 employees (43.5%), moderate satisfaction (3.51/5), lowest performance (3.0), youngest average age (34.95), lowest income ($4,813). Entry-level or junior cohort.
Cluster 3 (High Earners): 229 employees (15.6%), oldest (47.4 years), highest income ($14,983), longest tenure (15.2 years), moderate satisfaction (2.69/5). Senior, stable workforce.
Cluster 1 (Performers): 208 employees (14.1%), highest performance rating (4.0), moderate satisfaction (2.74/5), mid-range income ($5,434).
Silhouette Score (0.29): Weak separation indicates 29% of employees could plausibly belong to a neighboring cluster. Treat segment assignments as probabilistic guidance, not absolute truth.

Interpretation

The analysis successfully identified four workforce archetypes, but the weak silhouette score means the clusters are not tightly separated—many employees sit near cluster boundaries. Cluster 4 is your actionable priority: these are proven performers (rating 3.0 despite low satisfaction) earning below-market rates with minimal job satisfaction. This combination is a classic churn signature. Cluster 3 represents your institutional knowledge base (15+ years tenure, highest pay), while Cluster 2 is your growth pipeline (youngest, lowest cost, but lowest performance). Cluster 1 are your solid mid-career contributors.

Context

The analysis used K=4 clusters, though the silhouette analysis recommended K=3 (silhouette=0.36). The choice of K=4 provides more granular segmentation but at the cost of weaker statistical separation. PCA explains 57.8% of variance in two dimensions, sufficient for visualization but indicating that employee profiles are genuinely multidimensional. No data quality issues were detected (zero rows removed).

---

Deployment Recommendation

Confidence: Moderate (65%)

Deploy this segmentation immediately for Cluster 4 retention focus—the flight-risk signal is clear and actionable regardless of silhouette weakness. Use clusters 1–3 as exploratory guidance for HR program design, but validate segment membership through qualitative interviews before making major policy decisions. The weak silhouette score means you should not use automated cluster assignment for individual employee decisions without human review.

Business Value & ROI Potential

Cluster 4 retention: If 20% of Cluster 4 would otherwise churn (78 employees/year), and replacement cost is 1.5× salary (~$7,320 per employee), preventing even 10 departures saves ~$73,200 annually.
Targeted development: Segment-specific programs (e.g., accelerated promotion for Cluster 1, compensation review for Cluster 4) reduce broad-brush HR spend and improve engagement ROI.
Succession planning: Cluster 3 profile (senior, stable) identifies your knowledge-transfer priorities.

Risks & Limitations

Silhouette=0.29 is weak: Approximately 71% of employees are not strongly assigned to their cluster. Treat as a starting hypothesis, not ground truth.
Missing variables: The model uses only five attributes. Unmeasured factors (role type, manager quality, career growth opportunity, remote work preference) may be stronger drivers of satisfaction and churn.
Temporal snapshot: This is a point-in-time segmentation. Cluster membership will drift as employees age, earn raises, and change roles.
Cluster 4 causality unclear: Low satisfaction + high performance could indicate burnout, undercompensation, or misalignment with role—the data does not distinguish. Conduct exit interviews and stay interviews to diagnose root cause.

Distribution of employees across discovered workforce segments

Interpretation

Headline

Cluster 2 dominates your workforce at 43.5% (640 employees), while three smaller segments range from 14.1% to 26.7%, indicating one core group with three distinct outlier populations.

Purpose

This section reveals how your 1,470-person workforce naturally segments into four distinct groups based on employee characteristics. Understanding segment sizes tells you whether you have a homogeneous workforce or multiple distinct populations requiring different management strategies. The unequal distribution suggests one large "typical" employee profile with three smaller, potentially higher-risk or higher-value groups.

Key Findings

Cluster 2 (Core Segment): 640 employees (43.5%) — nearly half your workforce shares similar characteristics
Cluster 4 (Secondary Segment): 393 employees (26.7%) — a substantial secondary group, one-quarter of staff
Cluster 3 (Smaller Segment): 229 employees (15.6%) — distinct minority population
Cluster 1 (Smallest Segment): 208 employees (14.1%) — your most differentiated group
Silhouette Score: 0.293 — weak cluster separation indicates boundaries between groups are fuzzy, not sharp

Interpretation

The 3:1 ratio between largest and smallest clusters reveals a workforce with one dominant profile and three smaller populations. This imbalance is typical in employee segmentation—most staff cluster around average characteristics while outliers (high performers, flight risks, or specialized roles) form smaller groups. The weak silhouette score (0.293, below the 0.5 threshold for reasonable separation) suggests these clusters overlap considerably; employees near cluster boundaries share traits with multiple groups.

Context

The silhouette score indicates the 4-cluster solution provides weak but usable segmentation. The analysis flagged k=3 as optimal, but k=4 was selected—likely to isolate a specific high-value or high-risk group. Verify whether the smaller clusters represent actionable populations (e.g., flight risks, top talent) before investing in segment-specific interventions.

Radar-style heatmap comparing standardized feature centroids across all clusters

Interpretation

Headline

Cluster 4 exhibits the classic flight-risk profile: lowest job satisfaction (−1.13 std) paired with below-average performance (−0.43 std), signaling disengaged mid-career employees at immediate retention risk.

Purpose

This heatmap reveals how the four employee segments differ across five key dimensions—Age, Monthly Income, Job Satisfaction, Years at Company, and Performance Rating. By comparing standardized feature values (z-scores), we identify which clusters are above or below average on each dimension, enabling targeted retention and engagement strategies. The analysis specifically flags Cluster 4 as a flight-risk segment based on the combination of low satisfaction and weak performance indicators.

Key Findings

Cluster 1 (High Performers): Exceptional performance rating (+2.35 std), average satisfaction (+0.01 std), and below-average income (−0.23 std)—high-value employees potentially underpaid relative to contribution.
Cluster 4 (Disengaged Mid-Career): Critically low job satisfaction (−1.13 std), below-average income (−0.34 std), and weak performance (−0.43 std)—the identified flight-risk segment.
Cluster 2 (Satisfied Baseline): Highest job satisfaction (+0.81 std) with average performance and income—stable, engaged workforce.
Cluster 3 (Senior High-Earners): Highest income (+2.35 std) and tenure (+1.42 std), moderate satisfaction—experienced, well-compensated employees.

Interpretation

Cluster 4 represents 393 employees (26.7% of workforce) trapped in a disengagement spiral: low satisfaction drives low performance, which may suppress advancement and income growth. Unlike Cluster 1 (underpaid high performers), Cluster 4 lacks the performance lever for compensation negotiation. Cluster 3 shows that tenure and income correlate strongly, suggesting career progression works—but Cluster 4's short tenure (−0.26 std) and low satisfaction suggest they may not stay long enough to reach that level.

Context

The standardized scale allows direct comparison across features with different units (age in years, income in dollars, satisfaction on a 1–4 scale). Cluster 1's high performance despite low income and Cluster 4's low satisfaction despite average tenure both warrant immediate investigation into compensation equity and role fit.

2D scatter plot of employees colored by cluster assignment using top two principal components

Interpretation

Headline

The four workforce clusters show moderate overlap in the 2D projection, with 57.8% of variance captured—indicating natural groupings exist but are not sharply separated.

Purpose

This visualization compresses the five clustering features (age, income, satisfaction, tenure, performance) into two dimensions to assess whether the K-means algorithm found distinct, separable employee segments. The scatter plot reveals the spatial relationship between clusters and identifies potential boundary cases or outliers that blur segment boundaries.

Key Findings

Variance Captured: PC1 and PC2 together explain 57.8% of total variance (37.8% + 20%), leaving 42.2% of information in the remaining three dimensions. This means the 2D view is incomplete but captures more than half the story.
Cluster Overlap: The sample data shows employees from different clusters distributed across both positive and negative PC1 and PC2 ranges, suggesting clusters share overlapping feature profiles in this projection.
Cluster 2 Dominance: At 43.5% of the workforce (640 employees), Cluster 2 occupies a large portion of the space, while Cluster 1 (14.1%, 208 employees) is smaller and more concentrated.

Interpretation

The moderate silhouette score of 0.293 (from the overall analysis) aligns with this visual pattern: clusters are statistically meaningful but not perfectly separated. The overlap visible in 2D does not invalidate the clustering—it reflects that employees within different segments share some characteristics while differing in others. The loss of 42.2% of variance in the 2D projection means some distinguishing features are invisible here; the full five-dimensional space shows clearer separation.

Context

PCA projection is a lossy visualization tool. Apparent overlap in 2D may disappear when viewing the complete feature space. The skewed distribution of PC1 (skew = -0.53) suggests some employees are outliers on the high-income or high-performance end, which may represent distinct subgroups worth investigating separately.

Which departments are overrepresented in each workforce segment

Interpretation

Headline

Research & Development dominates all four clusters at 62.8–70.7%, indicating the flight-risk segment (Cluster 4) has no department-specific concentration — the attrition risk is systemic, not localized to one business unit.

Purpose

This section identifies whether specific departments are overrepresented in at-risk employee segments. If one department drove the flight-risk cluster, it would signal a localized problem (management, culture, compensation) that could be addressed surgically. Even distribution across departments points to company-wide issues affecting all business units equally.

Key Findings

R&D Dominance Across All Clusters: Research & Development represents 62.8–70.7% of every cluster, including the flight-risk Cluster 4 (62.8%). This is the defining pattern.
Sales Representation: Sales comprises 26–31.8% of each cluster, showing consistent distribution with no cluster-specific concentration.
Human Resources Minimal: HR represents only 3.4–5.3% across all clusters, reflecting its smaller workforce size.
No Department Concentration in Flight Risk: Cluster 4 shows no department overrepresentation — R&D, Sales, and HR are proportionally distributed as they are in lower-risk clusters.

Interpretation

The flight-risk cluster (Cluster 4) does not concentrate in any single department. R&D's 62.8% representation in Cluster 4 mirrors its 65–70.7% presence in other clusters, indicating R&D employees are distributed across all risk profiles. This pattern rules out department-specific management failures or localized compensation issues as the primary driver of flight risk. Instead, the risk factors are systemic — affecting employees across all business units equally.

Context

This analysis assumes department assignment is current and accurate. Cross-validation with actual historical attrition data by department would confirm whether this proportional distribution truly predicts departure risk or whether other unmeasured factors (role level, tenure, manager quality) better explain the flight-risk cluster's composition.

Silhouette scores by k showing how the optimal number of clusters was selected

Interpretation

Headline

The analysis chose 4 clusters despite k=3 being statistically optimal (silhouette 0.36 vs. 0.29), trading cluster quality for business interpretability.

Purpose

Silhouette analysis evaluates cluster quality across different values of k (number of clusters) to determine the optimal segmentation. This section shows whether the chosen 4-cluster model is statistically justified or represents a trade-off between statistical purity and practical usability. Understanding this trade-off is critical for assessing whether the resulting employee segments are reliable or merely convenient.

Key Findings

Optimal k by silhouette score: k=3 with score 0.36 — the highest quality clustering across all tested values
Final model (k=4): Silhouette score 0.29 — a 19% decline in cluster quality compared to k=3
Score interpretation: Both values fall in the weak-to-moderate range (0.25–0.5), indicating natural overlap in employee characteristics rather than crisp, distinct groups
WCSS trend: Decreasing from k=2 (5,509.8) to k=6 (2,686.5), showing diminishing returns after k=3

Interpretation

The 4-cluster solution sacrifices statistical quality for business utility. At k=3, employees cluster more cohesively, but the analysis team selected k=4 — likely because the fourth cluster represents a meaningful business segment (e.g., the identified flight-risk group) despite lower silhouette performance. This is a valid trade-off when domain insight justifies it, but it means cluster boundaries are softer: some employees sit between clusters and could reasonably belong to multiple groups.

Context

Silhouette scores below 0.5 are typical for HR data, where employee characteristics naturally overlap. The weak absolute scores suggest the five features (Age, Income, Satisfaction, Tenure, Performance) don't create sharp employee boundaries. Verify that the k=4 choice was driven by business need, not statistical convenience.

Detailed cluster profile table showing mean feature values for each segment

cluster_label	n_employees	pct_workforce	mean_feature_1	mean_feature_2	mean_feature_3	mean_feature_4	mean_feature_5
Cluster 1	208	14.1	35.96	5434	2.74	6.06	4
Cluster 2	640	43.5	34.95	4813	3.51	5.38	3
Cluster 3	229	15.6	47.44	1.498e+04	2.69	15.21	3.08
Cluster 4	393	26.7	34.53	4880	1.48	5.39	3

Interpretation

Headline

The cluster profile table is empty — segment characteristics cannot be interpreted without mean feature values for each of the 4 clusters.

Purpose

This section is designed to reveal the defining characteristics of each employee segment by comparing mean values across five key features (Age, Monthly Income, Job Satisfaction, Years at Company, and Performance Rating). These profiles enable HR to assign business-meaningful labels to clusters and identify retention risks tied to compensation, tenure, or engagement. Without populated profile data, the segmentation analysis cannot be operationalized.

Key Findings

Cluster Profile Data: Empty — no mean feature values available for any of the 4 clusters
Cluster Sizes: Range from 208 to 640 employees (14.1% to 43.5% of workforce)
Centroid Data Available: Raw and standardized means exist in the cluster_centroids table but are not reflected in the cluster_profiles output

Interpretation

The clustering algorithm successfully assigned all 1,470 employees to 4 segments with reasonable size distribution. However, the summary profile table that translates these assignments into actionable segment descriptions is missing. The underlying centroid data shows meaningful variation across clusters — for example, Cluster 3 has substantially higher mean income ($14,983 vs. $4,813–$5,434 in other clusters) and longer tenure (15.21 years vs. 5.4–6.1 years) — but these patterns cannot be formally documented without the profile table.

Context

The cluster_centroids table contains the raw data needed to reconstruct profiles manually. The silhouette score of 0.293 indicates weak cluster separation, suggesting segment boundaries are soft and overlapping. This limitation should be noted when assigning business labels to clusters.

Workforce Segmentation Overview

Configuration

Module Parameters

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Data Preprocessing

Data Quality

Data Quality

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Executive Summary

Key Metrics

Key Findings

Summary

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Deployment Recommendation

Business Value & ROI Potential

Risks & Limitations

Workforce Segment Distribution

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Cluster Feature Profiles

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Cluster Separation (PCA)

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Department Composition by Cluster

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Optimal K Selection

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context

Cluster Profile Summary

Interpretation

Headline

Purpose

Key Findings

Interpretation

Context