Executive Summary
Overall classification accuracy and key discriminating measurements
The Random Forest classifier achieved 95.6% accuracy on the held-out test set, correctly identifying the species of 43 out of 45 unseen flowers. petal_length emerged as the most discriminating measurement by a wide margin, with setosa showing the highest per-species recall. LDA confirmed that just two linear axes capture nearly all between-species variance, demonstrating that iris species are geometrically well-separated in measurement space.
Species Measurement Profiles
Mean petal and sepal dimensions per species
| n | species | petal_width_mean | sepal_width_mean | petal_length_mean | sepal_length_mean |
|---|---|---|---|---|---|
| 50 | setosa | 0.25 | 3.43 | 1.46 | 5.01 |
| 50 | versicolor | 1.33 | 2.77 | 4.26 | 5.94 |
| 50 | virginica | 2.03 | 2.97 | 5.55 | 6.59 |
Setosa has by far the smallest petal dimensions (mean petal length 1.46 cm, width 0.25 cm), making it trivially separable from the other two species. Versicolor and Virginica overlap more on sepal measurements but differ clearly on petal length and width, explaining their occasional misclassification. Each species is represented by 50 specimens.
Measurement Distributions by Species
Box plots of all four measurements split by species
Petal length and petal width show the clearest separation across species, with virtually no overlap between Setosa and the other two taxa. Sepal width shows more within-species variability and greater overlap, explaining why it contributes less to classification accuracy. Comparing the four box plot groups reveals that petal measurements are the primary driver of species identity.
Petal Dimensions Scatter by Species
Scatter of petal length vs petal width coloured by species
Plotting petal width against petal length reveals three visually distinct clusters with almost no overlap, confirming that these two measurements alone are sufficient to classify Setosa from all other specimens. Versicolor and Virginica clusters are adjacent but separable, with a small zone of potential confusion at cluster boundaries. This scatter is the most intuitive illustration of iris species separation.
Feature Correlation Matrix
Pearson correlations between all four iris measurements
Petal length and petal width are strongly correlated (r ≈ 0.96), meaning they carry largely overlapping species information. Sepal length is moderately correlated with both petal measures, while sepal width is nearly independent — explaining why it adds less discriminative power despite being a distinct biological measurement. High inter-feature correlation reduces effective dimensionality but does not harm accuracy.
Feature Importance for Species Classification
Random Forest mean-decrease-accuracy scores per measurement
petal_length is the most important feature with a mean-decrease-accuracy of 15.614, indicating that permuting this column causes the largest drop in classification accuracy. Petal measurements collectively dominate over sepal measurements, confirming the visual evidence from the scatter plot. Sepal width typically has the lowest importance because it overlaps across species more than the petal dimensions.
Classification Confusion Matrix
Actual vs predicted species on the held-out test set
The overall test-set accuracy is 95.6%. Setosa is classified perfectly — no specimen is ever confused with Versicolor or Virginica. Any misclassifications occur exclusively between Versicolor and Virginica, which are the two morphologically similar species. Diagonal cells represent correct predictions; off-diagonal counts show where the classifier makes errors.
LDA Species Projection
All specimens projected onto the first two linear discriminant axes
Linear Discriminant Analysis compresses the four measurements into two axes that maximise between-species separation. On this projection, Setosa occupies the far left of LD1, completely isolated from the other two species. Versicolor and Virginica form adjacent but distinct clusters, with LD1 capturing most of the species variance and LD2 providing secondary resolution between those two species.