Workforce Segmentation Overview
Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| n_clusters | 4 | n_clusters |
| scale_data | TRUE | scale_data |
| max_k | 6 | max_k |
Headline
The analysis identified 4 employee clusters, but silhouette score of 0.29 indicates weak cluster separation—consider using 3 clusters instead for clearer, more actionable segments.
Purpose
This workforce segmentation analysis applied K-means clustering to 1,470 employees across 5 features (age, income, job satisfaction, tenure, performance rating) to identify natural employee groups for targeted HR strategies. The analysis evaluated cluster quality and generated employee-to-cluster assignments with departmental breakdowns.
Key Findings
- Silhouette Score: 0.29 (weak structure) — falls in the 0.25–0.5 range indicating poor cluster cohesion. The optimal K=3 achieved 0.36, substantially better than the final K=4 at 0.29.
- Cluster Imbalance: Cluster 2 dominates at 43.5% (640 employees), while Cluster 1 represents only 14.1% (208 employees). This skew suggests unequal natural groupings.
- Flight Risk Cluster: Cluster 4 (393 employees, 26.7%) flagged as high-risk, characterized by lowest job satisfaction (1.48 vs. 2.6 average) and lowest performance rating (3.0 vs. 3.27 average).
- Variance Explained: PC1 and PC2 together explain 57.8% of total variance, leaving 42.2% unexplained—moderate dimensionality reduction effectiveness.
- Data Quality: All 1,470 rows retained; no missing values or outliers removed.
Interpretation
The weak silhouette score (0.29) signals that clusters overlap substantially and lack clear boundaries. Employees within clusters are not distinctly similar to each other relative to other clusters. The algorithm's recommendation of K=3 (silhouette 0.36) suggests the data contains three natural groupings, not four. Cluster 4's low satisfaction and performance profile is actionable but represents only 27% of the workforce, limiting its strategic impact relative to the dominant Cluster 2.
Context
K-means assumes spherical clusters of similar size; the 3:1 size ratio between Cluster 2 and Cluster 1 violates this assumption. The weak silhouette score may reflect genuine workforce heterogeneity or indicate that the five features do not cleanly separate employees into distinct personas. Departmental distribution (R&D dominates all clusters at 62–71%) suggests department is not a primary differentiator.
Data preprocessing and column mapping
Headline
All 1,470 employee records passed quality checks with zero rows removed, ensuring a complete dataset for clustering analysis.
Purpose
Data preprocessing is the foundation of any statistical analysis. This section documents how the raw dataset was cleaned, validated, and prepared for the clustering model. A 100% retention rate indicates no missing values, duplicates, or outliers were flagged for removal—a clean starting point that strengthens confidence in downstream results.
Key Findings
- Retention Rate: 100% (1,470 of 1,470 rows retained) - No data loss during cleaning
- Rows Removed: 0 - No records excluded due to missing values, duplicates, or quality issues
- Dataset Completeness: All 1,470 employees included in clustering analysis with no gaps
Interpretation
The dataset entered the analysis in excellent condition. No rows were dropped for missing data, outliers, or data quality issues, meaning the full workforce population is represented in the four clusters. This complete retention is ideal for workforce segmentation—every employee has been assigned to a cluster, avoiding selection bias that could skew HR insights. The absence of preprocessing exclusions also means the cluster profiles reflect the true composition of the organization without artificial filtering.
Context
While 100% retention is positive, note that the data quality report does not detail missing value patterns, outlier detection methods, or standardization/scaling procedures applied to the five clustering features. The clustering analysis itself (silhouette score of 0.293) suggests weak cluster separation, which may reflect genuine workforce diversity rather than data quality issues.
Executive Summary
Executive summary of workforce segmentation findings and HR action recommendations
| Finding | Value |
|---|---|
| Workforce Segments Discovered | 4 distinct segments |
| Total Employees Analyzed | 1,470 |
| Cluster Quality (Silhouette) | 0.2927 (weak) |
| Potential Flight-Risk Segment | Cluster 4 |
| Largest Segment Share | 43.5% of workforce |
| Smallest Segment Share | 14.1% of workforce |
| PCA Variance Explained (2D) | 57.8% |
Key Findings:
• 4 distinct employee segments identified based on age, income, satisfaction, tenure, and performance
• Flight-risk alert: Cluster 4 shows the highest performance-to-satisfaction ratio — high performers with low satisfaction who are at risk of leaving
• Segment sizes range from 14.1% to 43.5% of the workforce
• PCA captures 57.8% of variance in 2D for visual cluster separation
Recommended Actions:
1. Review cluster centroid heatmap to assign HR-meaningful labels to each segment
2. Prioritize Cluster 4 for immediate retention interviews and compensation benchmarking
3. Analyze Department composition to identify which teams are over-represented in at-risk clusters
4. Design segment-specific HR programs: promotion paths, compensation reviews, coaching, onboarding improvements
Headline
Four distinct workforce segments identified, but weak cluster quality (silhouette=0.29) limits confidence; Cluster 4 emerges as a critical flight-risk group of 393 high performers (26.7% of workforce) with dangerously low job satisfaction (1.48/5).
Purpose
This analysis segments your 1,470-person workforce into four natural groups using five employee attributes (age, income, satisfaction, tenure, performance). The goal is to identify which employee populations require targeted HR interventions—particularly those at risk of departure. The weak silhouette score signals that clusters overlap considerably, meaning segment boundaries are fuzzy rather than crisp.
Key Findings
- Cluster 4 (Flight Risk): 393 employees, lowest job satisfaction (1.48/5), average performance rating of 3.0, average tenure 5.4 years, monthly income $4,880. This is your highest-risk attrition segment.
- Cluster 2 (Largest Segment): 640 employees (43.5%), moderate satisfaction (3.51/5), lowest performance (3.0), youngest average age (34.95), lowest income ($4,813). Entry-level or junior cohort.
- Cluster 3 (High Earners): 229 employees (15.6%), oldest (47.4 years), highest income ($14,983), longest tenure (15.2 years), moderate satisfaction (2.69/5). Senior, stable workforce.
- Cluster 1 (Performers): 208 employees (14.1%), highest performance rating (4.0), moderate satisfaction (2.74/5), mid-range income ($5,434).
- Silhouette Score (0.29): Weak separation indicates 29% of employees could plausibly belong to a neighboring cluster. Treat segment assignments as probabilistic guidance, not absolute truth.
Interpretation
The analysis successfully identified four workforce archetypes, but the weak silhouette score means the clusters are not tightly separated—many employees sit near cluster boundaries. Cluster 4 is your actionable priority: these are proven performers (rating 3.0 despite low satisfaction) earning below-market rates with minimal job satisfaction. This combination is a classic churn signature. Cluster 3 represents your institutional knowledge base (15+ years tenure, highest pay), while Cluster 2 is your growth pipeline (youngest, lowest cost, but lowest performance). Cluster 1 are your solid mid-career contributors.
Context
The analysis used K=4 clusters, though the silhouette analysis recommended K=3 (silhouette=0.36). The choice of K=4 provides more granular segmentation but at the cost of weaker statistical separation. PCA explains 57.8% of variance in two dimensions, sufficient for visualization but indicating that employee profiles are genuinely multidimensional. No data quality issues were detected (zero rows removed).
---
Deployment Recommendation
Confidence: Moderate (65%)
Deploy this segmentation immediately for Cluster 4 retention focus—the flight-risk signal is clear and actionable regardless of silhouette weakness. Use clusters 1–3 as exploratory guidance for HR program design, but validate segment membership through qualitative interviews before making major policy decisions. The weak silhouette score means you should not use automated cluster assignment for individual employee decisions without human review.
Business Value & ROI Potential
- Cluster 4 retention: If 20% of Cluster 4 would otherwise churn (78 employees/year), and replacement cost is 1.5× salary (~$7,320 per employee), preventing even 10 departures saves ~$73,200 annually.
- Targeted development: Segment-specific programs (e.g., accelerated promotion for Cluster 1, compensation review for Cluster 4) reduce broad-brush HR spend and improve engagement ROI.
- Succession planning: Cluster 3 profile (senior, stable) identifies your knowledge-transfer priorities.
Risks & Limitations
- Silhouette=0.29 is weak: Approximately 71% of employees are not strongly assigned to their cluster. Treat as a starting hypothesis, not ground truth.
- Missing variables: The model uses only five attributes. Unmeasured factors (role type, manager quality, career growth opportunity, remote work preference) may be stronger drivers of satisfaction and churn.
- Temporal snapshot: This is a point-in-time segmentation. Cluster membership will drift as employees age, earn raises, and change roles.
- Cluster 4 causality unclear: Low satisfaction + high performance could indicate burnout, undercompensation, or misalignment with role—the data does not distinguish. Conduct exit interviews and stay interviews to diagnose root cause.
Workforce Segment Distribution
Distribution of employees across discovered workforce segments
Headline
Cluster 2 dominates your workforce at 43.5% (640 employees), while three smaller segments range from 14.1% to 26.7%, indicating one core group with three distinct outlier populations.
Purpose
This section reveals how your 1,470-person workforce naturally segments into four distinct groups based on employee characteristics. Understanding segment sizes tells you whether you have a homogeneous workforce or multiple distinct populations requiring different management strategies. The unequal distribution suggests one large "typical" employee profile with three smaller, potentially higher-risk or higher-value groups.
Key Findings
- Cluster 2 (Core Segment): 640 employees (43.5%) — nearly half your workforce shares similar characteristics
- Cluster 4 (Secondary Segment): 393 employees (26.7%) — a substantial secondary group, one-quarter of staff
- Cluster 3 (Smaller Segment): 229 employees (15.6%) — distinct minority population
- Cluster 1 (Smallest Segment): 208 employees (14.1%) — your most differentiated group
- Silhouette Score: 0.293 — weak cluster separation indicates boundaries between groups are fuzzy, not sharp
Interpretation
The 3:1 ratio between largest and smallest clusters reveals a workforce with one dominant profile and three smaller populations. This imbalance is typical in employee segmentation—most staff cluster around average characteristics while outliers (high performers, flight risks, or specialized roles) form smaller groups. The weak silhouette score (0.293, below the 0.5 threshold for reasonable separation) suggests these clusters overlap considerably; employees near cluster boundaries share traits with multiple groups.
Context
The silhouette score indicates the 4-cluster solution provides weak but usable segmentation. The analysis flagged k=3 as optimal, but k=4 was selected—likely to isolate a specific high-value or high-risk group. Verify whether the smaller clusters represent actionable populations (e.g., flight risks, top talent) before investing in segment-specific interventions.
Cluster Feature Profiles
Radar-style heatmap comparing standardized feature centroids across all clusters
Headline
Cluster 4 exhibits the classic flight-risk profile: lowest job satisfaction (−1.13 std) paired with below-average performance (−0.43 std), signaling disengaged mid-career employees at immediate retention risk.
Purpose
This heatmap reveals how the four employee segments differ across five key dimensions—Age, Monthly Income, Job Satisfaction, Years at Company, and Performance Rating. By comparing standardized feature values (z-scores), we identify which clusters are above or below average on each dimension, enabling targeted retention and engagement strategies. The analysis specifically flags Cluster 4 as a flight-risk segment based on the combination of low satisfaction and weak performance indicators.
Key Findings
- Cluster 1 (High Performers): Exceptional performance rating (+2.35 std), average satisfaction (+0.01 std), and below-average income (−0.23 std)—high-value employees potentially underpaid relative to contribution.
- Cluster 4 (Disengaged Mid-Career): Critically low job satisfaction (−1.13 std), below-average income (−0.34 std), and weak performance (−0.43 std)—the identified flight-risk segment.
- Cluster 2 (Satisfied Baseline): Highest job satisfaction (+0.81 std) with average performance and income—stable, engaged workforce.
- Cluster 3 (Senior High-Earners): Highest income (+2.35 std) and tenure (+1.42 std), moderate satisfaction—experienced, well-compensated employees.
Interpretation
Cluster 4 represents 393 employees (26.7% of workforce) trapped in a disengagement spiral: low satisfaction drives low performance, which may suppress advancement and income growth. Unlike Cluster 1 (underpaid high performers), Cluster 4 lacks the performance lever for compensation negotiation. Cluster 3 shows that tenure and income correlate strongly, suggesting career progression works—but Cluster 4's short tenure (−0.26 std) and low satisfaction suggest they may not stay long enough to reach that level.
Context
The standardized scale allows direct comparison across features with different units (age in years, income in dollars, satisfaction on a 1–4 scale). Cluster 1's high performance despite low income and Cluster 4's low satisfaction despite average tenure both warrant immediate investigation into compensation equity and role fit.
Cluster Separation (PCA)
2D scatter plot of employees colored by cluster assignment using top two principal components
Headline
The four workforce clusters show moderate overlap in the 2D projection, with 57.8% of variance captured—indicating natural groupings exist but are not sharply separated.
Purpose
This visualization compresses the five clustering features (age, income, satisfaction, tenure, performance) into two dimensions to assess whether the K-means algorithm found distinct, separable employee segments. The scatter plot reveals the spatial relationship between clusters and identifies potential boundary cases or outliers that blur segment boundaries.
Key Findings
- Variance Captured: PC1 and PC2 together explain 57.8% of total variance (37.8% + 20%), leaving 42.2% of information in the remaining three dimensions. This means the 2D view is incomplete but captures more than half the story.
- Cluster Overlap: The sample data shows employees from different clusters distributed across both positive and negative PC1 and PC2 ranges, suggesting clusters share overlapping feature profiles in this projection.
- Cluster 2 Dominance: At 43.5% of the workforce (640 employees), Cluster 2 occupies a large portion of the space, while Cluster 1 (14.1%, 208 employees) is smaller and more concentrated.
Interpretation
The moderate silhouette score of 0.293 (from the overall analysis) aligns with this visual pattern: clusters are statistically meaningful but not perfectly separated. The overlap visible in 2D does not invalidate the clustering—it reflects that employees within different segments share some characteristics while differing in others. The loss of 42.2% of variance in the 2D projection means some distinguishing features are invisible here; the full five-dimensional space shows clearer separation.
Context
PCA projection is a lossy visualization tool. Apparent overlap in 2D may disappear when viewing the complete feature space. The skewed distribution of PC1 (skew = -0.53) suggests some employees are outliers on the high-income or high-performance end, which may represent distinct subgroups worth investigating separately.
Department Composition by Cluster
Which departments are overrepresented in each workforce segment
Headline
Research & Development dominates all four clusters at 62.8–70.7%, indicating the flight-risk segment (Cluster 4) has no department-specific concentration — the attrition risk is systemic, not localized to one business unit.
Purpose
This section identifies whether specific departments are overrepresented in at-risk employee segments. If one department drove the flight-risk cluster, it would signal a localized problem (management, culture, compensation) that could be addressed surgically. Even distribution across departments points to company-wide issues affecting all business units equally.
Key Findings
- R&D Dominance Across All Clusters: Research & Development represents 62.8–70.7% of every cluster, including the flight-risk Cluster 4 (62.8%). This is the defining pattern.
- Sales Representation: Sales comprises 26–31.8% of each cluster, showing consistent distribution with no cluster-specific concentration.
- Human Resources Minimal: HR represents only 3.4–5.3% across all clusters, reflecting its smaller workforce size.
- No Department Concentration in Flight Risk: Cluster 4 shows no department overrepresentation — R&D, Sales, and HR are proportionally distributed as they are in lower-risk clusters.
Interpretation
The flight-risk cluster (Cluster 4) does not concentrate in any single department. R&D's 62.8% representation in Cluster 4 mirrors its 65–70.7% presence in other clusters, indicating R&D employees are distributed across all risk profiles. This pattern rules out department-specific management failures or localized compensation issues as the primary driver of flight risk. Instead, the risk factors are systemic — affecting employees across all business units equally.
Context
This analysis assumes department assignment is current and accurate. Cross-validation with actual historical attrition data by department would confirm whether this proportional distribution truly predicts departure risk or whether other unmeasured factors (role level, tenure, manager quality) better explain the flight-risk cluster's composition.
Optimal K Selection
Silhouette scores by k showing how the optimal number of clusters was selected
Headline
The analysis chose 4 clusters despite k=3 being statistically optimal (silhouette 0.36 vs. 0.29), trading cluster quality for business interpretability.
Purpose
Silhouette analysis evaluates cluster quality across different values of k (number of clusters) to determine the optimal segmentation. This section shows whether the chosen 4-cluster model is statistically justified or represents a trade-off between statistical purity and practical usability. Understanding this trade-off is critical for assessing whether the resulting employee segments are reliable or merely convenient.
Key Findings
- Optimal k by silhouette score: k=3 with score 0.36 — the highest quality clustering across all tested values
- Final model (k=4): Silhouette score 0.29 — a 19% decline in cluster quality compared to k=3
- Score interpretation: Both values fall in the weak-to-moderate range (0.25–0.5), indicating natural overlap in employee characteristics rather than crisp, distinct groups
- WCSS trend: Decreasing from k=2 (5,509.8) to k=6 (2,686.5), showing diminishing returns after k=3
Interpretation
The 4-cluster solution sacrifices statistical quality for business utility. At k=3, employees cluster more cohesively, but the analysis team selected k=4 — likely because the fourth cluster represents a meaningful business segment (e.g., the identified flight-risk group) despite lower silhouette performance. This is a valid trade-off when domain insight justifies it, but it means cluster boundaries are softer: some employees sit between clusters and could reasonably belong to multiple groups.
Context
Silhouette scores below 0.5 are typical for HR data, where employee characteristics naturally overlap. The weak absolute scores suggest the five features (Age, Income, Satisfaction, Tenure, Performance) don't create sharp employee boundaries. Verify that the k=4 choice was driven by business need, not statistical convenience.
Cluster Profile Summary
Detailed cluster profile table showing mean feature values for each segment
| cluster_label | n_employees | pct_workforce | mean_feature_1 | mean_feature_2 | mean_feature_3 | mean_feature_4 | mean_feature_5 |
|---|---|---|---|---|---|---|---|
| Cluster 1 | 208 | 14.1 | 35.96 | 5434 | 2.74 | 6.06 | 4 |
| Cluster 2 | 640 | 43.5 | 34.95 | 4813 | 3.51 | 5.38 | 3 |
| Cluster 3 | 229 | 15.6 | 47.44 | 1.498e+04 | 2.69 | 15.21 | 3.08 |
| Cluster 4 | 393 | 26.7 | 34.53 | 4880 | 1.48 | 5.39 | 3 |
Headline
The cluster profile table is empty — segment characteristics cannot be interpreted without mean feature values for each of the 4 clusters.
Purpose
This section is designed to reveal the defining characteristics of each employee segment by comparing mean values across five key features (Age, Monthly Income, Job Satisfaction, Years at Company, and Performance Rating). These profiles enable HR to assign business-meaningful labels to clusters and identify retention risks tied to compensation, tenure, or engagement. Without populated profile data, the segmentation analysis cannot be operationalized.
Key Findings
- Cluster Profile Data: Empty — no mean feature values available for any of the 4 clusters
- Cluster Sizes: Range from 208 to 640 employees (14.1% to 43.5% of workforce)
- Centroid Data Available: Raw and standardized means exist in the
cluster_centroidstable but are not reflected in thecluster_profilesoutput
Interpretation
The clustering algorithm successfully assigned all 1,470 employees to 4 segments with reasonable size distribution. However, the summary profile table that translates these assignments into actionable segment descriptions is missing. The underlying centroid data shows meaningful variation across clusters — for example, Cluster 3 has substantially higher mean income ($14,983 vs. $4,813–$5,434 in other clusters) and longer tenure (15.21 years vs. 5.4–6.1 years) — but these patterns cannot be formally documented without the profile table.
Context
The cluster_centroids table contains the raw data needed to reconstruct profiles manually. The silhouette score of 0.293 indicates weak cluster separation, suggesting segment boundaries are soft and overlapping. This limitation should be noted when assigning business labels to clusters.