Analysis overview and configuration
| Parameter | Value | _row |
|---|---|---|
| confidence_level | 0.95 | confidence_level |
| test_size | 0.3 | test_size |
| classification_threshold | 0.5 | classification_threshold |
| positive_class | completed | positive_class |
This logistic regression analysis identifies which student characteristics predict test preparation completion at an Educational Research Institute. The model evaluates five predictors (math score, reading score, writing score, gender, and lunch plan) against a binary outcome (completed vs. none) using 1,000 complete student records with no missing data.
The model successfully identifies non-completers but struggles with true positive detection. Male students show
Data preprocessing and column mapping
| Metric | Value |
|---|---|
| Initial Rows | 1,000 |
| Final Rows | 1,000 |
| Rows Removed | 0 |
| Retention Rate | 100% |
This section documents the data cleaning and preparation phase for the logistic regression model predicting test preparation completion. Perfect data retention indicates no observations were excluded during preprocessing, which is critical for maintaining statistical power and representativeness when identifying student characteristics that predict completion behavior.
The perfect retention rate supports the model's ability to leverage the complete 1,000-student sample for logistic regression estimation. This maximizes statistical power for detecting significant predictors of test preparation completion. However, the discrepancy between "missing values removed" and zero rows removed suggests preprocessing may have occurred at the column level rather than row level, potentially through imputation or feature engineering that isn't explicitly documented here.
The lack of train/test split documentation limits visibility into how model performance metrics (AUC=0.8, Accuracy=
| Metric | Value | Interpretation |
|---|---|---|
| AUC | 0.8 | Good |
| Accuracy | 74.6% | Moderate |
| Sensitivity | 52.3% | Low |
| Specificity | 87% | High |
| F1 Score | 0.596 | Low |
| McFadden R² | 0.172 | Weak |
| Significant Predictors | 4 | Many factors |
This section synthesizes the logistic regression model's performance in predicting test preparation completion across 1,000 students. Understanding whether the model achieves sufficient predictive accuracy and identifies actionable student characteristics is critical for determining deployment viability and intervention strategy effectiveness.
The model achieves the business objective of identifying predictive characteristics with acceptable discrimination (AUC 0.8). However, the low sensitivity reveals a critical trade-off: while the model excels at identifying non-completers
ROC curve showing model discrimination ability
The ROC curve evaluates the logistic regression model's ability to discriminate between students who completed test preparation and those who did not across all possible classification thresholds. This section directly addresses the model's predictive quality for the stated objective of identifying student characteristics that predict test preparation completion.
The AUC of 0.8 demonstrates the model has meaningful predictive power for test preparation completion. The model performs substantially better than chance, validating that the selected student characteristics (gender, lunch plan, and test scores) contain discriminative information. However, the moderate AUC reflects inherent complexity in predicting behavioral outcomes and suggests room for improvement through additional
Confusion matrix showing classification accuracy by class
This confusion matrix evaluates how well the logistic regression model predicts test preparation completion across the two outcome classes. It reveals the model's ability to correctly identify students who completed preparation versus those who did not, which directly addresses the core objective of identifying predictive student characteristics.
The model demonstrates asymmetric performance: it excels at identifying non-completion but struggles with completion detection. The high specificity (87%) indicates the model conservatively predicts completion, resulting in many false negatives. This imbalance reflects the class distribution (35.8% positive cases) and suggests the model's decision boundary favors the majority class. The moderate F1 score indicates reasonable but imperfect predictive utility for
Odds ratios with 95% confidence intervals for all predictors
This section quantifies the individual effect of each predictor on the odds of test preparation completion. The odds ratios and confidence intervals reveal which student characteristics are statistically reliable predictors and the magnitude of their influence on completion likelihood. This directly supports the analysis objective to identify which characteristics predict test preparation completion.
Four of five predictors significantly influence completion odds.
Distribution of predicted probabilities by actual class
This section visualizes how well the logistic regression model separates students who completed test preparation from those who did not. The distribution of predicted probabilities reveals the model's confidence in its classifications and identifies overlap regions where the model struggles to distinguish between classes. This directly supports the objective of identifying student characteristics that predict test preparation completion.
The predicted probabilities show meaningful separation between the two classes, consistent with the model's AUC of 0.80. The threshold of 0.367 is optimized below 0.50 because only 35.8% of students completed preparation, allowing the model to balance false positives and false negatives. The observed overlap explains why sensitivity (0
Full coefficient table with log-odds, odds ratios, and significance
| variable | log_odds | std_error | z_stat | p_value | odds_ratio | ci_lower | ci_upper | significant |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | -5.032 | 0.5788 | -8.694 | 0 | 0.007 | 0.002 | 0.02 | Yes |
| math score | -0.0882 | 0.0158 | -5.595 | 0 | 0.916 | 0.887 | 0.944 | Yes |
| reading score | -0.0946 | 0.0218 | -4.334 | 0 | 0.91 | 0.871 | 0.949 | Yes |
| writing score | 0.2322 | 0.0249 | 9.309 | 0 | 1.261 | 1.203 | 1.326 | Yes |
| student_gendermale | 2.009 | 0.287 | 7.001 | 0 | 7.457 | 4.295 | 13.25 | Yes |
| lunch_planstandard | -0.2038 | 0.1986 | -1.026 | 0.3048 | 0.816 | 0.552 | 1.204 | No |
This section quantifies the relationship between each student characteristic and test preparation completion. The coefficient table reveals which factors statistically predict completion and the magnitude of their effects, directly addressing the research objective to identify predictive student characteristics through logistic regression.
The model identifies gender as the dominant predictor of completion, with writing ability as a secondary factor. The inverse relationship