Analytics · Botanical · Iris · Species Classification

Executive Summary

Overall classification accuracy and key discriminating measurements

n_observations

150

n_train

105

n_test

45

overall_accuracy

0.9556

n_species

3

train_split_pct

70

top_feature_importance

15.6137

The Random Forest classifier achieved 95.6% accuracy on the held-out test set, correctly identifying the species of 43 out of 45 unseen flowers. petal_length emerged as the most discriminating measurement by a wide margin, with setosa showing the highest per-species recall. LDA confirmed that just two linear axes capture nearly all between-species variance, demonstrating that iris species are geometrically well-separated in measurement space.

Interpretation

The Random Forest classifier achieved 95.6% accuracy on the held-out test set, correctly identifying the species of 43 out of 45 unseen flowers. petal_length emerged as the most discriminating measurement by a wide margin, with setosa showing the highest per-species recall. LDA confirmed that just two linear axes capture nearly all between-species variance, demonstrating that iris species are geometrically well-separated in measurement space.

Data Table

Species Measurement Profiles

Mean petal and sepal dimensions per species

n	species	petal_width_mean	sepal_width_mean	petal_length_mean	sepal_length_mean
50	setosa	0.25	3.43	1.46	5.01
50	versicolor	1.33	2.77	4.26	5.94
50	virginica	2.03	2.97	5.55	6.59

Interpretation

Setosa has by far the smallest petal dimensions (mean petal length 1.46 cm, width 0.25 cm), making it trivially separable from the other two species. Versicolor and Virginica overlap more on sepal measurements but differ clearly on petal length and width, explaining their occasional misclassification. Each species is represented by 50 specimens.

Visualization

Measurement Distributions by Species

Box plots of all four measurements split by species

Interpretation

Petal length and petal width show the clearest separation across species, with virtually no overlap between Setosa and the other two taxa. Sepal width shows more within-species variability and greater overlap, explaining why it contributes less to classification accuracy. Comparing the four box plot groups reveals that petal measurements are the primary driver of species identity.

Visualization

Petal Dimensions Scatter by Species

Scatter of petal length vs petal width coloured by species

Interpretation

Plotting petal width against petal length reveals three visually distinct clusters with almost no overlap, confirming that these two measurements alone are sufficient to classify Setosa from all other specimens. Versicolor and Virginica clusters are adjacent but separable, with a small zone of potential confusion at cluster boundaries. This scatter is the most intuitive illustration of iris species separation.

Visualization

Feature Correlation Matrix

Pearson correlations between all four iris measurements

Interpretation

Petal length and petal width are strongly correlated (r ≈ 0.96), meaning they carry largely overlapping species information. Sepal length is moderately correlated with both petal measures, while sepal width is nearly independent — explaining why it adds less discriminative power despite being a distinct biological measurement. High inter-feature correlation reduces effective dimensionality but does not harm accuracy.

Visualization

Feature Importance for Species Classification

Random Forest mean-decrease-accuracy scores per measurement

Interpretation

petal_length is the most important feature with a mean-decrease-accuracy of 15.614, indicating that permuting this column causes the largest drop in classification accuracy. Petal measurements collectively dominate over sepal measurements, confirming the visual evidence from the scatter plot. Sepal width typically has the lowest importance because it overlaps across species more than the petal dimensions.

Visualization

Classification Confusion Matrix

Actual vs predicted species on the held-out test set

Interpretation

The overall test-set accuracy is 95.6%. Setosa is classified perfectly — no specimen is ever confused with Versicolor or Virginica. Any misclassifications occur exclusively between Versicolor and Virginica, which are the two morphologically similar species. Diagonal cells represent correct predictions; off-diagonal counts show where the classifier makes errors.

Visualization

LDA Species Projection

All specimens projected onto the first two linear discriminant axes

Interpretation

Linear Discriminant Analysis compresses the four measurements into two axes that maximise between-species separation. On this projection, Setosa occupies the far left of LD1, completely isolated from the other two species. Versicolor and Virginica form adjacent but distinct clusters, with LD1 capturing most of the species variance and LD2 providing secondary resolution between those two species.

What's wrong with this card?

Executive Summary

Species Measurement Profiles

Measurement Distributions by Species

Petal Dimensions Scatter by Species

Feature Correlation Matrix

Feature Importance for Species Classification

Classification Confusion Matrix

LDA Species Projection

Report an Issue