User · Flowers · Iris · Species Classifier
Executive Summary

Executive Summary

Overall classification accuracy and key discriminating measurements

n_observations
150
n_train
120
n_test
30
n_species
3
knn_best_accuracy
0.9
knn_best_k
9
top_feature_importance
35.6453
At the optimal k=9, the k-NN classifier achieved 90% accuracy on the held-out test set of 30 flowers. petal_length was the single most important measurement for species discrimination according to the random forest (mean decrease in accuracy: 35.6453). Together, petal length and petal width cleanly separate setosa from the other two species, while versicolor and virginica show some overlap requiring all four measurements for reliable classification.
Interpretation

At the optimal k=9, the k-NN classifier achieved 90% accuracy on the held-out test set of 30 flowers. petal_length was the single most important measurement for species discrimination according to the random forest (mean decrease in accuracy: 35.6453). Together, petal length and petal width cleanly separate setosa from the other two species, while versicolor and virginica show some overlap requiring all four measurements for reliable classification.

Visualization

Confusion Matrix

Actual vs predicted species on the held-out test set

Interpretation

The k-NN classifier correctly classified 27 of 30 test observations (90%). The most common misclassification was virginica predicted as versicolor (3 cases). Setosa is almost always perfectly classified because its petal dimensions are far smaller than those of versicolor and virginica, creating a clear separation boundary.

Visualization

Feature Importance (Random Forest)

Mean decrease in accuracy for each measurement from the random forest classifier

Interpretation

petal_length has the highest random forest feature importance (35.6453 mean decrease in accuracy), confirming it as the primary driver of species classification. sepal_width contributes the least — its distributions overlap substantially across versicolor and virginica, adding limited unique signal. Petal measurements together dominate sepal measurements, consistent with Fisher's original analysis of this dataset.

Visualization

k-NN Accuracy vs k

Classification accuracy on the test set for each value of k from 1 to k_max

Interpretation

Accuracy peaks at k=9 with 90% correct classifications on the test set. The lowest accuracy across the sweep was 83.3%, showing that the choice of k has a moderate but measurable impact on performance. For iris data, larger k values tend to smooth decision boundaries and slightly reduce sensitivity to borderline versicolor/virginica cases.

Visualization

Species Separation — Petal Dimensions

Petal length vs petal width coloured by species, showing cluster separation

Interpretation

In petal space, setosa forms a completely isolated cluster with petal lengths up to 1.9 cm — far below versicolor and virginica. Versicolor and virginica show partial overlap along both petal axes, but a diagonal boundary still separates most observations correctly. This plot explains why petal measurements dominate feature importance: a single threshold on petal length already separates setosa perfectly, and the remaining boundary between versicolor and virginica is almost linear in petal space.

Visualization

Species Separation — Sepal Dimensions

Sepal length vs sepal width coloured by species, showing degree of overlap

Interpretation

Sepal dimensions show considerably more overlap between species than petal dimensions. While setosa tends toward smaller sepal lengths and wider sepals, versicolor and virginica occupy almost the same sepal space, making reliable classification impossible from sepal measurements alone. This overlap explains why sepal features rank lower in random forest importance and underscores the necessity of including petal measurements for high-accuracy species identification.

Data Table

Per-Class Classification Metrics

Precision, recall, F1 score, and support for each species from the k-NN classifier

speciessupportprecisionrecallf1_score
setosa10111
versicolor100.769210.8695
virginica1010.70.8235
Interpretation

virginica is the most challenging species to classify, with an F1 score of 0.8235 — reflecting the morphological overlap with its nearest neighbour. Setosa typically achieves perfect or near-perfect precision and recall because its petal dimensions are completely non-overlapping with the other two species. The precision/recall trade-off for versicolor and virginica reveals whether the classifier over-predicts one at the expense of the other.

Data Table

Descriptive Statistics by Species

Mean, SD, min, and max of all four measurements broken out by species

speciesmeasurementmeansdminmax
setosasepal_length5.0060.3524.35.8
setosasepal_width3.4280.3792.34.4
setosapetal_length1.4620.17411.9
setosapetal_width0.2460.1050.10.6
versicolorsepal_length5.9360.5164.97
versicolorsepal_width2.770.31423.4
versicolorpetal_length4.260.4735.1
versicolorpetal_width1.3260.19811.8
virginicasepal_length6.5880.6364.97.9
virginicasepal_width2.9740.3222.23.8
virginicapetal_length5.5520.5524.56.9
virginicapetal_width2.0260.2751.42.5
Interpretation

petal_length shows the largest absolute difference in means across species (4.09 cm), making it the most visually distinctive measurement in a field identification context. Standard deviations are narrow within each species, confirming that the between-species differences are not artefacts of high within-species variance. Sepal width, by contrast, shows the smallest inter-species mean differences and the most within-species spread, explaining its low feature importance.

Your data has more stories to tell. Run any analysis on your own data — 60+ validated R modules, interactive reports, AI insights, and PDF export. 2,000 free credits on signup.
Try Free — No Signup Sign Up Free

Report an Issue

Tell us what's wrong. You'll get a free re-run of this analysis so you can try again with different parameters. If the re-run still doesn't meet your expectations, we'll refund your credits.

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing