User · Flowers · Iris · Species Classifier

Executive Summary

Overall classification accuracy and key discriminating measurements

n_observations

150

n_train

120

n_test

30

n_species

3

knn_best_accuracy

0.9

knn_best_k

9

top_feature_importance

35.6453

At the optimal k=9, the k-NN classifier achieved 90% accuracy on the held-out test set of 30 flowers. petal_length was the single most important measurement for species discrimination according to the random forest (mean decrease in accuracy: 35.6453). Together, petal length and petal width cleanly separate setosa from the other two species, while versicolor and virginica show some overlap requiring all four measurements for reliable classification.

Interpretation

At the optimal k=9, the k-NN classifier achieved 90% accuracy on the held-out test set of 30 flowers. petal_length was the single most important measurement for species discrimination according to the random forest (mean decrease in accuracy: 35.6453). Together, petal length and petal width cleanly separate setosa from the other two species, while versicolor and virginica show some overlap requiring all four measurements for reliable classification.

Visualization

Confusion Matrix

Actual vs predicted species on the held-out test set

Interpretation

The k-NN classifier correctly classified 27 of 30 test observations (90%). The most common misclassification was virginica predicted as versicolor (3 cases). Setosa is almost always perfectly classified because its petal dimensions are far smaller than those of versicolor and virginica, creating a clear separation boundary.

Visualization

Feature Importance (Random Forest)

Mean decrease in accuracy for each measurement from the random forest classifier

Interpretation

petal_length has the highest random forest feature importance (35.6453 mean decrease in accuracy), confirming it as the primary driver of species classification. sepal_width contributes the least — its distributions overlap substantially across versicolor and virginica, adding limited unique signal. Petal measurements together dominate sepal measurements, consistent with Fisher's original analysis of this dataset.

Visualization

k-NN Accuracy vs k

Classification accuracy on the test set for each value of k from 1 to k_max

Interpretation

Accuracy peaks at k=9 with 90% correct classifications on the test set. The lowest accuracy across the sweep was 83.3%, showing that the choice of k has a moderate but measurable impact on performance. For iris data, larger k values tend to smooth decision boundaries and slightly reduce sensitivity to borderline versicolor/virginica cases.

Visualization

Species Separation — Petal Dimensions

Petal length vs petal width coloured by species, showing cluster separation

Interpretation

In petal space, setosa forms a completely isolated cluster with petal lengths up to 1.9 cm — far below versicolor and virginica. Versicolor and virginica show partial overlap along both petal axes, but a diagonal boundary still separates most observations correctly. This plot explains why petal measurements dominate feature importance: a single threshold on petal length already separates setosa perfectly, and the remaining boundary between versicolor and virginica is almost linear in petal space.

Visualization

Species Separation — Sepal Dimensions

Sepal length vs sepal width coloured by species, showing degree of overlap

Interpretation

Sepal dimensions show considerably more overlap between species than petal dimensions. While setosa tends toward smaller sepal lengths and wider sepals, versicolor and virginica occupy almost the same sepal space, making reliable classification impossible from sepal measurements alone. This overlap explains why sepal features rank lower in random forest importance and underscores the necessity of including petal measurements for high-accuracy species identification.

Data Table

Per-Class Classification Metrics

Precision, recall, F1 score, and support for each species from the k-NN classifier

species	support	precision	recall	f1_score
setosa	10	1	1	1
versicolor	10	0.7692	1	0.8695
virginica	10	1	0.7	0.8235

Interpretation

virginica is the most challenging species to classify, with an F1 score of 0.8235 — reflecting the morphological overlap with its nearest neighbour. Setosa typically achieves perfect or near-perfect precision and recall because its petal dimensions are completely non-overlapping with the other two species. The precision/recall trade-off for versicolor and virginica reveals whether the classifier over-predicts one at the expense of the other.

Data Table

Descriptive Statistics by Species

Mean, SD, min, and max of all four measurements broken out by species

species	measurement	mean	sd	min	max
setosa	sepal_length	5.006	0.352	4.3	5.8
setosa	sepal_width	3.428	0.379	2.3	4.4
setosa	petal_length	1.462	0.174	1	1.9
setosa	petal_width	0.246	0.105	0.1	0.6
versicolor	sepal_length	5.936	0.516	4.9	7
versicolor	sepal_width	2.77	0.314	2	3.4
versicolor	petal_length	4.26	0.47	3	5.1
versicolor	petal_width	1.326	0.198	1	1.8
virginica	sepal_length	6.588	0.636	4.9	7.9
virginica	sepal_width	2.974	0.322	2.2	3.8
virginica	petal_length	5.552	0.552	4.5	6.9
virginica	petal_width	2.026	0.275	1.4	2.5

Interpretation

petal_length shows the largest absolute difference in means across species (4.09 cm), making it the most visually distinctive measurement in a field identification context. Standard deviations are narrow within each species, confirming that the between-species differences are not artefacts of high within-species variance. Sepal width, by contrast, shows the smallest inter-species mean differences and the most within-species spread, explaining its low feature importance.

What's wrong with this card?

Executive Summary

Confusion Matrix

Feature Importance (Random Forest)

k-NN Accuracy vs k

Species Separation — Petal Dimensions

Species Separation — Sepal Dimensions

Per-Class Classification Metrics

Descriptive Statistics by Species

Report an Issue