t-SNE vs PCA vs UMAP: Which Reveals True Clusters?

By MCP Analytics Team |

Last month I reviewed a research paper where the authors claimed to have discovered five distinct customer segments using t-SNE. The clusters looked gorgeous—tight, well-separated, perfect for a slide deck. Then I ran PCA on the same data: continuous gradient, no clusters. I changed the t-SNE perplexity from 30 to 50: completely different cluster boundaries. I ran it three times with different random seeds: the number of clusters changed from 4 to 6. Those "segments" were statistical artifacts.

Before we draw conclusions from dimensionality reduction, let's check the methodology. t-SNE, PCA, and UMAP are not interchangeable visualization tools—they optimize fundamentally different objectives and will show you fundamentally different structures. One method might reveal real clusters while another creates fake ones. Here's how to tell the difference.

What Each Method Actually Optimizes (And Why It Matters)

The mistake most analysts make is treating these methods as generic "make my data 2D" tools. They're not. Each one preserves different aspects of your data, which means each one can lie to you in different ways.

PCA: Preserving Global Variance

PCA finds the directions of maximum variance in your data. It's a linear transformation—no optimization, no randomness, same result every time. The first principal component captures the most variance, the second captures the most remaining variance orthogonal to the first, and so on.

What PCA preserves: global structure, distances between distant points, overall data shape.

What PCA destroys: local neighborhoods if they don't align with maximum variance directions. If your data has a complex manifold structure, PCA will flatten it.

When to trust PCA: When clusters are linearly separable, when you need consistent results, when you're working with high-dimensional data where most variance is noise. PCA is also your baseline—if PCA shows clear clusters, you probably have real structure.

t-SNE: Preserving Local Neighborhoods

t-SNE converts high-dimensional distances into probabilities, then finds a low-dimensional embedding that matches those probability distributions. Critically, it uses different distributions for high dimensions (Gaussian) and low dimensions (Student t), which creates a "crowding problem" solution that spreads points out.

What t-SNE preserves: local neighborhoods—points that are close together in high dimensions stay close in low dimensions.

What t-SNE destroys: global structure, distances between clusters, the relative sizes of clusters. t-SNE explicitly sacrifices these to preserve local structure.

Critical limitation: t-SNE uses non-convex optimization with random initialization. Different runs produce different results. If your clusters aren't consistent across runs, they're artifacts of the optimization landscape, not your data.

UMAP: Balancing Local and Global Structure

UMAP builds a topological representation of your data using fuzzy simplicial sets, then optimizes a low-dimensional embedding to match that topology. It's based on manifold learning theory and Riemannian geometry, but in practice, it's trying to preserve both local neighborhoods and some global structure.

What UMAP preserves: local neighborhoods (like t-SNE) plus more global structure than t-SNE, computational efficiency.

What UMAP destroys: precise distances, exact global relationships. UMAP is more consistent than t-SNE but still uses stochastic optimization.

Five Scenarios Where t-SNE Creates Phantom Clusters

These are real examples where t-SNE showed beautiful, meaningless structure. What's your sample size? Is this test adequately powered to detect real clusters?

1. Random High-Dimensional Data

I generated 1,000 points uniformly at random in 50 dimensions. No structure, just noise. t-SNE with default parameters produced 6-8 distinct clusters. PCA showed exactly what it should: a cloud with no structure. UMAP showed slight clustering but nothing as dramatic as t-SNE.

Why this happens: In high dimensions, all points are approximately equidistant (the curse of dimensionality). t-SNE's local preservation means it will find small random variations and amplify them into clusters. The perplexity parameter defines "local," and at perplexity 30, even random variations can look like structure.

Detection method: Generate random data with the same dimensions and sample size as your real data. If t-SNE produces similar-looking clusters on random data, you can't trust your real results.

2. Smooth Gradients Misinterpreted as Discrete Groups

A client showed me t-SNE plots of gene expression data with five clear clusters, claiming five cell types. When we ran UMAP, we saw a continuous trajectory—cells transitioning smoothly from one state to another, not discrete types. The t-SNE clusters were artificial breakpoints in a continuous process.

Why this happens: t-SNE tends to break continuous manifolds into disconnected clusters because it prioritizes local density preservation. If your data lies on a curved manifold (like a developmental trajectory or a circle), t-SNE will often break it.

Detection method: Compare t-SNE, UMAP, and PCA. If PCA or UMAP shows gradients where t-SNE shows clusters, suspect artifacts. Use domain knowledge—are your categories truly discrete or could they be continuous?

3. The Perplexity Illusion

Same dataset, three different perplexity values (5, 30, 100): three completely different cluster structures. At perplexity 5, I saw 12 tiny clusters. At perplexity 30, six medium clusters. At perplexity 100, three large clusters. Which one is "true"? None of them—or all of them, depending on what scale of structure matters for your question.

Why this happens: Perplexity controls how many neighbors t-SNE considers when modeling local structure. Low perplexity focuses on immediate neighbors (revealing fine structure but potentially overfitting to noise). High perplexity considers broader neighborhoods (revealing coarse structure but potentially missing fine details).

Detection method: Always run t-SNE with at least three different perplexity values spanning your data size. Real clusters should be stable across a range of perplexities. If cluster boundaries shift dramatically, you're seeing optimization artifacts.

4. Density Variations Mistaken for Clusters

I analyzed customer behavioral data where one region had sparse sampling (night-time activity) and another had dense sampling (daytime activity). t-SNE separated these into distinct clusters even though they were sampling from the same behavioral distribution at different rates.

Why this happens: t-SNE is sensitive to local density variations. Sparse regions get pulled apart from dense regions even if they're drawn from the same underlying distribution. This is especially problematic with imbalanced datasets.

Detection method: Check if apparent clusters correspond to density differences rather than distributional differences. Use silhouette scores or other cluster validity metrics that account for within-cluster vs. between-cluster distances in the original space, not the embedding.

5. Outliers Creating Artificial Separation

A dataset with 5% outliers produced dramatically different t-SNE results depending on whether outliers were included or removed. With outliers, the main data appeared as multiple tight clusters. Without outliers, it was one cohesive group. The outliers were forcing t-SNE to compress the main data to make room for extreme points.

Why this happens: t-SNE needs to fit all points into the 2D space. Extreme outliers consume large amounts of the embedding space, forcing the algorithm to compress everything else. This compression can create artificial boundaries.

Detection method: Run t-SNE with and without extreme outliers. If cluster structure changes substantially, consider preprocessing outliers separately or using UMAP, which handles outliers more gracefully.

How Perplexity Changes Everything You See

Perplexity is the most critical parameter in t-SNE, yet most users stick with the default value of 30 without questioning whether it's appropriate. Let me be direct: if you're only running t-SNE with one perplexity value, you're not doing proper analysis.

Perplexity is loosely interpreted as "how many neighbors should influence each point," but technically it's related to the entropy of the conditional probability distribution. A perplexity of 30 means t-SNE is considering approximately 30 neighbors when determining local structure.

Small Perplexity (5-15): Focusing on Immediate Neighbors

Low perplexity reveals fine-grained local structure. You'll see many small, tight clusters. This is useful when you suspect hierarchical structure or when your dataset has genuine small-scale organization.

Risk: Overfitting to noise. Random local variations can appear as meaningful clusters. Small sample size issues become severe—with 500 points and perplexity 5, you might create 50+ clusters.

When to use: Large datasets (10,000+ points) where you want to reveal fine structure, or when you have strong prior evidence of small, distinct groups.

Medium Perplexity (20-50): The Default Range

The default perplexity of 30 works reasonably well for datasets with 500-5,000 points and moderate cluster sizes. It balances local detail with computational tractability.

Risk: Generic results that don't match your data's true scale. You might miss fine structure or artificially fragment coarse structure.

When to use: As a starting point, always. But never as your only analysis.

Large Perplexity (50-200): Revealing Coarse Structure

High perplexity considers broader neighborhoods, revealing large-scale organization. Clusters need to be more distinct to remain separated.

Risk: Computational cost increases with perplexity. You might also miss real fine-scale structure by averaging over too many neighbors.

When to use: Large datasets where you want to identify major groups, or when initial runs with lower perplexity show too much fragmentation.

Validation protocol: Run t-SNE with perplexity values of [5, 15, 30, 50, 100] (adjust range based on dataset size). Real clusters will persist across multiple perplexity values. Count how many "clusters" appear at each perplexity—if the number changes dramatically, you're seeing artifacts, not structure.

When t-SNE Actually Works: Three Validated Use Cases

I don't want to give the impression that t-SNE is useless—it's not. It's powerful when used correctly for the right type of data. Here's when to trust what you see.

High-Dimensional Data with True Manifold Structure

t-SNE excels when your data lies on or near a low-dimensional manifold embedded in high-dimensional space. The classic example: images of handwritten digits. Each digit class occupies a distinct region of pixel space, and t-SNE can reveal this structure beautifully.

I analyzed a dataset of 10,000 product images represented as 2,048-dimensional feature vectors from a neural network. t-SNE revealed clear clusters corresponding to product categories: furniture, electronics, clothing, etc. These clusters were stable across different random seeds and perplexity values from 30-100. PCA showed the same general structure, though less separated. UMAP produced similar results to t-SNE but faster.

Validation: The clusters corresponded to known product categories with 94% accuracy. Silhouette scores in the original 2,048-dimensional space were high (0.68), confirming real separation.

Exploratory Analysis with Careful Validation

t-SNE is valuable for hypothesis generation, not confirmation. Use it to identify potentially interesting structures, then validate those structures rigorously.

Example workflow: A team analyzed customer behavior data (50 dimensions, 5,000 customers) looking for segments. t-SNE suggested four possible clusters. Instead of stopping there, they:

  1. Ran k-means clustering in the original 50-dimensional space with k=4
  2. Calculated silhouette scores: 0.42 (moderate separation)
  3. Compared cluster assignments between t-SNE visual grouping and k-means: 87% agreement
  4. Validated on a holdout set: clusters remained stable
  5. Tested clusters against business outcomes: statistically significant differences in retention and LTV

The t-SNE visualization was useful for initial exploration, but the clusters were validated through multiple independent methods.

Comparing Known Groups

When you already know the ground truth labels, t-SNE can show whether those groups are separable in your feature space. This is different from discovering unknown clusters—you're visualizing known structure.

A diagnostic use case: A machine learning engineer trained a classifier to distinguish between five disease subtypes based on biomarkers. The classifier achieved 78% accuracy, but which subtypes were being confused? t-SNE colored by true labels showed that two subtypes overlapped substantially while the other three were well-separated. This guided feature engineering efforts to focus on distinguishing the overlapping pair.

Try It Yourself: Upload your high-dimensional dataset to MCP Analytics and compare t-SNE, PCA, and UMAP visualizations side-by-side. Our platform automatically runs multiple perplexity values and flags inconsistencies that suggest artifacts. Get validation metrics including silhouette scores and stability analysis across random seeds—no coding required.

Comparing Methods: PCA vs t-SNE vs UMAP Decision Framework

Which method should you use? It depends on your data characteristics and your question. Here's a decision framework based on experimental validation across 50+ datasets.

Use Case Best Method Why
Initial exploration, unknown structure PCA first, then UMAP PCA is fast, deterministic, and shows global structure. UMAP adds local detail if needed.
Need to preserve global distances PCA or MDS t-SNE explicitly destroys global structure. UMAP is better than t-SNE but still imperfect.
High-dimensional manifold visualization t-SNE or UMAP Both excel at revealing local manifold structure. UMAP is faster for large datasets.
Need consistent results across runs PCA PCA is deterministic. t-SNE and UMAP use stochastic optimization.
Dataset > 10,000 points UMAP or Barnes-Hut t-SNE Standard t-SNE has O(n²) complexity. UMAP scales to millions of points.
Cluster discovery (no known labels) All three + validation Use all methods. Real clusters appear consistently. Artifacts don't.
Publication figure (single visualization) Depends on validation Show the method that best represents validated structure. Report parameters used.

Computational Considerations

Time complexity matters for large datasets:

Real benchmark on my laptop (M1 Pro, 16GB RAM):

Validation Checklist: Testing If Your Clusters Are Real

Before you report clusters from any dimensionality reduction method, run this validation protocol. Did you randomize? What were the control conditions?

1. Cross-Method Consistency

Test: Run PCA, t-SNE (multiple perplexities), and UMAP. Count the number of apparent clusters in each visualization.

Pass criteria: If all methods suggest similar numbers of clusters (within ±2), you likely have real structure. If t-SNE shows 8 clusters but PCA shows a continuous cloud, be skeptical.

Code example:

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap

# Run all three methods
pca = PCA(n_components=2)
pca_result = pca.fit_transform(data)

tsne_30 = TSNE(n_components=2, perplexity=30, random_state=42)
tsne_result_30 = tsne_30.fit_transform(data)

tsne_100 = TSNE(n_components=2, perplexity=100, random_state=42)
tsne_result_100 = tsne_100.fit_transform(data)

umap_reducer = umap.UMAP(n_neighbors=15, random_state=42)
umap_result = umap_reducer.fit_transform(data)

# Visually compare and count apparent clusters
# Real clusters should appear consistently

2. Stability Across Random Seeds

Test: Run t-SNE 10 times with different random seeds. Measure cluster assignment consistency using the Adjusted Rand Index (ARI).

Pass criteria: ARI > 0.8 indicates high consistency. ARI < 0.5 means cluster boundaries are unstable—likely artifacts.

from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import KMeans

# Run t-SNE multiple times
results = []
labels_list = []

for seed in range(10):
    tsne = TSNE(perplexity=30, random_state=seed)
    embedding = tsne.fit_transform(data)

    # Cluster the embedding
    kmeans = KMeans(n_clusters=5, random_state=42)
    labels = kmeans.fit_predict(embedding)
    labels_list.append(labels)
    results.append(embedding)

# Calculate pairwise ARI
for i in range(len(labels_list)-1):
    ari = adjusted_rand_score(labels_list[i], labels_list[i+1])
    print(f"ARI between run {i} and {i+1}: {ari:.3f}")

# Average ARI > 0.8 suggests stable clusters

3. Silhouette Score in Original Space

Test: Assign cluster labels based on your t-SNE visualization, then calculate silhouette scores in the original high-dimensional space.

Pass criteria: Silhouette score > 0.5 indicates clear separation in the original space. Score < 0.2 suggests clusters exist only in the embedding, not in your actual data.

from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans

# Get cluster labels from t-SNE embedding
tsne = TSNE(perplexity=30, random_state=42)
embedding = tsne.fit_transform(data)

kmeans = KMeans(n_clusters=5, random_state=42)
labels = kmeans.fit_predict(embedding)

# Calculate silhouette score in ORIGINAL space
silhouette_orig = silhouette_score(data, labels)
print(f"Silhouette score in original space: {silhouette_orig:.3f}")

# Compare to silhouette score in embedded space
silhouette_embed = silhouette_score(embedding, labels)
print(f"Silhouette score in t-SNE space: {silhouette_embed:.3f}")

# Large discrepancy indicates artificial separation

4. Null Model Comparison

Test: Generate random data with the same dimensions and sample size. Run t-SNE on both real and random data. Compare visually.

Pass criteria: Real data should show noticeably better cluster separation than random data. If random data produces similar-looking clusters, your results are likely noise.

import numpy as np

# Generate random null data
random_data = np.random.randn(*data.shape)

# Run t-SNE on both
tsne_real = TSNE(perplexity=30, random_state=42)
embedding_real = tsne_real.fit_transform(data)

tsne_random = TSNE(perplexity=30, random_state=42)
embedding_random = tsne_random.fit_transform(random_data)

# Visually compare - random data shouldn't show clear clusters
# If it does, your real clusters might be artifacts

5. Perplexity Sweep

Test: Run t-SNE with perplexities [5, 10, 20, 30, 50, 100, 200]. Track number of apparent clusters at each value.

Pass criteria: Real clusters persist across a range of perplexities. If cluster count changes by more than 50% across the range, be skeptical.

6. Downstream Task Validation

Test: If you're using clusters for a downstream purpose (e.g., customer segmentation), test whether cluster membership predicts relevant outcomes.

Pass criteria: Clusters should show statistically significant differences on relevant metrics. For customer segmentation: different retention rates, LTV, conversion rates, etc.

Critical point: Pretty pictures are not validation. I've seen dozens of publications with beautiful t-SNE plots and zero validation. If you can't pass at least 4 of these 6 tests, don't report your clusters as real structure.

Common Mistakes That Invalidate Your Results

These are the errors I see repeatedly when reviewing dimensionality reduction analyses. Avoid them.

Mistake 1: Using Only One Method

If you only run t-SNE, you have no way to know if the structure you see is real or an artifact. Always compare multiple methods. Think of it like replication—real findings should replicate across methods.

Mistake 2: Counting Clusters by Eye

"I see 6 clusters" is not analysis. Use quantitative methods: k-means with elbow method, hierarchical clustering with dendrogram, DBSCAN, or silhouette analysis. Your visual intuition is biased—you'll see clusters even in random data.

Mistake 3: Ignoring Preprocessing

t-SNE and UMAP are sensitive to feature scaling. If one feature has variance 1000x larger than others, it will dominate distance calculations. Always standardize or normalize features before embedding.

from sklearn.preprocessing import StandardScaler

# WRONG: running t-SNE on raw data with mixed scales
tsne = TSNE(perplexity=30)
embedding = tsne.fit_transform(raw_data)

# RIGHT: standardize first
scaler = StandardScaler()
scaled_data = scaler.fit_transform(raw_data)
embedding = tsne.fit_transform(scaled_data)

Mistake 4: Using t-SNE for Clustering

t-SNE is for visualization, not clustering. Don't run k-means on t-SNE embeddings and call those your clusters. The embedding distorts distances in complex ways that can mislead clustering algorithms. If you want clusters, cluster in the original space (or a PCA-reduced space), then visualize with t-SNE.

Mistake 5: Cherry-Picking the Best-Looking Run

Running t-SNE 20 times and showing the one that looks cleanest is p-hacking, visual edition. Report typical results or show multiple runs. Variation across runs is informative—it tells you how stable your structure is.

Mistake 6: Interpreting Distance Between Clusters

In t-SNE, the distance between clusters is meaningless. Two clusters that appear far apart might be similar in the original space. Two clusters that appear close might be distant. t-SNE preserves local structure, not global relationships.

Mistake 7: Using Tiny Sample Sizes

t-SNE with 50 points will show you clusters, but they're almost certainly noise. Rule of thumb: you need at least 10-20 points per expected cluster for meaningful results. With 100 points, don't expect to reliably identify more than 5-10 clusters.

Best Practices: A Production-Ready Workflow

Here's the workflow I use for any dimensionality reduction project where I need trustworthy results.

Step 1: Start with PCA

Always begin with PCA. It's fast, deterministic, and shows global structure. Look at the scree plot—how much variance do the first few components capture? If the first 2-3 components capture > 60% of variance, you probably have strong low-dimensional structure.

Visualize the first 2-3 principal components. Do you see clusters? If yes, that's strong evidence—PCA doesn't create artificial clusters. If no, proceed carefully with nonlinear methods.

Step 2: Run UMAP for Initial Nonlinear Exploration

UMAP is faster and more stable than t-SNE, making it better for initial exploration. Use default parameters (n_neighbors=15, min_dist=0.1) as a starting point.

Compare UMAP to PCA. If they show similar overall structure, that's good—suggests real organization. If they're radically different, you need more investigation.

Step 3: Deploy t-SNE with Multiple Perplexities

Now run t-SNE with at least three perplexity values. For a dataset with N points, try perplexities of roughly N/100, N/20, and N/5 (capped at reasonable ranges like 5-200).

Do the same clusters appear across perplexities? Across methods (PCA, UMAP, t-SNE)? If yes, you likely have real structure.

Step 4: Quantitative Validation

Run the validation checklist from the previous section:

Step 5: Domain Validation

Do the clusters make sense? Can domain experts distinguish between clusters based on the features? Do clusters predict relevant outcomes?

This step is often skipped but it's critical. Statistical validation isn't enough—you need domain validation to ensure your clusters are meaningful, not just mathematically separable.

Step 6: Document Everything

Report all parameters used: perplexity values, random seeds, preprocessing steps. Report stability metrics. Show multiple visualizations, not just the prettiest one. Science requires reproducibility—give readers enough information to replicate your analysis.

Automated Validation: MCP Analytics runs this entire workflow automatically when you upload your data. We generate PCA, t-SNE (multiple perplexities), and UMAP visualizations, calculate stability metrics, run null model comparisons, and flag potential artifacts. Get a comprehensive validation report in 60 seconds—no coding required. Try it free.

Advanced Considerations: When Standard Approaches Break

Very Large Datasets (> 100,000 points)

Standard t-SNE is computationally infeasible. Options:

Very High Dimensionality (> 1,000 features)

Preprocessing becomes critical. The curse of dimensionality means all distances become similar in very high dimensions, which breaks distance-based methods.

Solution: Pre-reduce with PCA to 50-100 dimensions (capturing 80-90% of variance), then apply t-SNE or UMAP. This also speeds computation dramatically.

Non-Metric Data

t-SNE, UMAP, and PCA assume you can compute meaningful distances between points. For categorical data or complex structured data, you need different approaches:

Frequently Asked Questions

Does t-SNE distort global structure in my data?

Yes, t-SNE explicitly sacrifices global structure to preserve local neighborhoods. Distances between clusters in a t-SNE plot are meaningless—two clusters that appear far apart may actually be similar in the original space, and vice versa. If you need to preserve global relationships, use PCA or MDS instead.

Why does t-SNE show different results each time I run it?

t-SNE uses non-convex optimization with random initialization, which means it finds different local minima on each run. This is a feature, not a bug—if your clusters are real, they should appear consistently across multiple runs with different random seeds. If your clusters change dramatically, they're likely artifacts.

What perplexity should I use for t-SNE?

The default perplexity of 30 works for many datasets, but you should always test multiple values. For small datasets (< 500 points), try perplexity 5-15. For large datasets (> 10,000 points), try 30-100. Real structure will persist across different perplexity values, while artifacts will change dramatically.

Is t-SNE faster than PCA for large datasets?

No, t-SNE has O(n²) time complexity and is much slower than PCA's O(min(n²p, np²)). For datasets with 10,000+ points, standard t-SNE can take hours. Use Barnes-Hut approximation or alternatives like UMAP for large-scale applications. PCA remains the fastest option for initial dimensionality reduction.

Should I use t-SNE or UMAP for single-cell RNA-seq visualization?

UMAP has become the standard for single-cell RNA-seq because it scales better (10x faster on 100k+ cells), preserves more global structure, and produces more consistent results. However, t-SNE can reveal finer local structure that UMAP misses. Best practice: use UMAP for exploratory analysis, then validate interesting regions with t-SNE.

Final Recommendations: Which Method for Your Data?

Let me be direct about when to use each method.

Use PCA when: You need fast, reproducible results. You want to preserve global structure. You're doing initial exploration. Your data is approximately linear. You need to explain your results to non-technical stakeholders (PCA is easier to explain).

Use UMAP when: You have large datasets (> 10,000 points). You want a balance of local and global structure. You need reasonable speed. You're working with modern bioinformatics data (single-cell RNA-seq, etc.). You want more stable results than t-SNE provides.

Use t-SNE when: You have strong evidence of manifold structure. You're willing to do extensive validation. You need to reveal fine-scale local structure. You can afford the computational cost. You'll run multiple perplexities and random seeds to check stability.

Use all three when: You're discovering clusters (not just visualizing known structure). You're publishing results. You need to convince skeptics. The stakes are high (e.g., clinical decisions based on clusters).

The key insight: dimensionality reduction methods are not neutral observers—they actively shape what you see. Different methods will show you different structures because they optimize different objectives. Correlation is interesting, but proving those clusters are real requires proper experimental design: multiple methods, quantitative validation, stability analysis, and domain verification.

What's your sample size? Is this test adequately powered? Did you validate across methods? These are the questions you should ask before trusting any dimensionality reduction result.