Executive Summary
Overall classification accuracy and key discriminating measurements
At the optimal k=9, the k-NN classifier achieved 90% accuracy on the held-out test set of 30 flowers. petal_length was the single most important measurement for species discrimination according to the random forest (mean decrease in accuracy: 35.6453). Together, petal length and petal width cleanly separate setosa from the other two species, while versicolor and virginica show some overlap requiring all four measurements for reliable classification.
Confusion Matrix
Actual vs predicted species on the held-out test set
The k-NN classifier correctly classified 27 of 30 test observations (90%). The most common misclassification was virginica predicted as versicolor (3 cases). Setosa is almost always perfectly classified because its petal dimensions are far smaller than those of versicolor and virginica, creating a clear separation boundary.
Feature Importance (Random Forest)
Mean decrease in accuracy for each measurement from the random forest classifier
petal_length has the highest random forest feature importance (35.6453 mean decrease in accuracy), confirming it as the primary driver of species classification. sepal_width contributes the least — its distributions overlap substantially across versicolor and virginica, adding limited unique signal. Petal measurements together dominate sepal measurements, consistent with Fisher's original analysis of this dataset.
k-NN Accuracy vs k
Classification accuracy on the test set for each value of k from 1 to k_max
Accuracy peaks at k=9 with 90% correct classifications on the test set. The lowest accuracy across the sweep was 83.3%, showing that the choice of k has a moderate but measurable impact on performance. For iris data, larger k values tend to smooth decision boundaries and slightly reduce sensitivity to borderline versicolor/virginica cases.
Species Separation — Petal Dimensions
Petal length vs petal width coloured by species, showing cluster separation
In petal space, setosa forms a completely isolated cluster with petal lengths up to 1.9 cm — far below versicolor and virginica. Versicolor and virginica show partial overlap along both petal axes, but a diagonal boundary still separates most observations correctly. This plot explains why petal measurements dominate feature importance: a single threshold on petal length already separates setosa perfectly, and the remaining boundary between versicolor and virginica is almost linear in petal space.
Species Separation — Sepal Dimensions
Sepal length vs sepal width coloured by species, showing degree of overlap
Sepal dimensions show considerably more overlap between species than petal dimensions. While setosa tends toward smaller sepal lengths and wider sepals, versicolor and virginica occupy almost the same sepal space, making reliable classification impossible from sepal measurements alone. This overlap explains why sepal features rank lower in random forest importance and underscores the necessity of including petal measurements for high-accuracy species identification.
Per-Class Classification Metrics
Precision, recall, F1 score, and support for each species from the k-NN classifier
| species | support | precision | recall | f1_score |
|---|---|---|---|---|
| setosa | 10 | 1 | 1 | 1 |
| versicolor | 10 | 0.7692 | 1 | 0.8695 |
| virginica | 10 | 1 | 0.7 | 0.8235 |
virginica is the most challenging species to classify, with an F1 score of 0.8235 — reflecting the morphological overlap with its nearest neighbour. Setosa typically achieves perfect or near-perfect precision and recall because its petal dimensions are completely non-overlapping with the other two species. The precision/recall trade-off for versicolor and virginica reveals whether the classifier over-predicts one at the expense of the other.
Descriptive Statistics by Species
Mean, SD, min, and max of all four measurements broken out by species
| species | measurement | mean | sd | min | max |
|---|---|---|---|---|---|
| setosa | sepal_length | 5.006 | 0.352 | 4.3 | 5.8 |
| setosa | sepal_width | 3.428 | 0.379 | 2.3 | 4.4 |
| setosa | petal_length | 1.462 | 0.174 | 1 | 1.9 |
| setosa | petal_width | 0.246 | 0.105 | 0.1 | 0.6 |
| versicolor | sepal_length | 5.936 | 0.516 | 4.9 | 7 |
| versicolor | sepal_width | 2.77 | 0.314 | 2 | 3.4 |
| versicolor | petal_length | 4.26 | 0.47 | 3 | 5.1 |
| versicolor | petal_width | 1.326 | 0.198 | 1 | 1.8 |
| virginica | sepal_length | 6.588 | 0.636 | 4.9 | 7.9 |
| virginica | sepal_width | 2.974 | 0.322 | 2.2 | 3.8 |
| virginica | petal_length | 5.552 | 0.552 | 4.5 | 6.9 |
| virginica | petal_width | 2.026 | 0.275 | 1.4 | 2.5 |
petal_length shows the largest absolute difference in means across species (4.09 cm), making it the most visually distinctive measurement in a field identification context. Standard deviations are narrow within each species, confirming that the between-species differences are not artefacts of high within-species variance. Sepal width, by contrast, shows the smallest inter-species mean differences and the most within-species spread, explaining its low feature importance.