Overview

Logistic Regression

Binary Classification Analysis

Analysis overview and configuration

Configuration

Analysis TypeLogistic
CompanyEducational Research Institute
ObjectiveIdentify student characteristics that predict test preparation completion using logistic regression
Analysis Date2026-03-14
Processing Idtest_1773549132
Total Observations1000

Module Parameters

ParameterValue_row
confidence_level0.95confidence_level
test_size0.3test_size
classification_threshold0.5classification_threshold
positive_classcompletedpositive_class
Logistic analysis for Educational Research Institute

Interpretation

Purpose

This logistic regression analysis identifies which student characteristics predict test preparation completion at an Educational Research Institute. The model evaluates five predictors (math score, reading score, writing score, gender, and lunch plan) against a binary outcome (completed vs. none) using 1,000 complete student records with no missing data.

Key Findings

  • AUC (0.80): Model demonstrates good discrimination ability between students who complete and don't complete test preparation, exceeding the 0.70 quality threshold.
  • Accuracy (0.75): Overall correct classification rate, though masked by class imbalance (35.8% positive cases).
  • Specificity (0.87): Strong at identifying non-completers, but Sensitivity (0.52) is notably weaker at identifying completers.
  • Significant Predictors (4 of 5): Writing score (OR=1.26), student gender male (OR=7.46), math score (OR=0.92), and reading score (OR=0.91) are statistically significant; lunch plan is not.
  • McFadden R² (0.17): Model explains 17% of variance, indicating moderate explanatory power.

Interpretation

The model successfully identifies non-completers but struggles with true positive detection. Male students show

Data Preparation

Data Preprocessing

Data Quality & Class Balance

Data preprocessing and column mapping

Data Quality

Initial Rows1000
Final Rows1000
Rows Removed0
Retention Rate100

Data Quality

MetricValue
Initial Rows1,000
Final Rows1,000
Rows Removed0
Retention Rate100%
Processed 1,000 observations, retained 1,000 (100.0%) after cleaning

Interpretation

Purpose

This section documents the data cleaning and preparation phase for the logistic regression model predicting test preparation completion. Perfect data retention indicates no observations were excluded during preprocessing, which is critical for maintaining statistical power and representativeness when identifying student characteristics that predict completion behavior.

Key Findings

  • Retention Rate: 100% (1,000 of 1,000 rows retained) - No observations were removed during cleaning, suggesting either minimal data quality issues or that missing values were handled through imputation rather than deletion
  • Rows Removed: 0 - The dataset required no exclusions, contrasting with the metadata note that "missing values removed" yet maintaining full row count
  • Train/Test Split: Not documented - The absence of explicit split information suggests the model may use the full dataset or employ alternative validation methods not captured here

Interpretation

The perfect retention rate supports the model's ability to leverage the complete 1,000-student sample for logistic regression estimation. This maximizes statistical power for detecting significant predictors of test preparation completion. However, the discrepancy between "missing values removed" and zero rows removed suggests preprocessing may have occurred at the column level rather than row level, potentially through imputation or feature engineering that isn't explicitly documented here.

Context

The lack of train/test split documentation limits visibility into how model performance metrics (AUC=0.8, Accuracy=

Executive Summary

Executive Summary

Key Findings & Model Performance

Key Metrics

AUC
0.8
Accuracy
74.6%
Observations_Used
1000
Predictors
5

Key Findings

MetricValueInterpretation
AUC0.8Good
Accuracy74.6%Moderate
Sensitivity52.3%Low
Specificity87%High
F1 Score0.596Low
McFadden R²0.172Weak
Significant Predictors4Many factors

Summary

Bottom Line: Logistic regression on 1000 observations predicts 'completed' vs 'none' with AUC = 0.8 (good). Overall accuracy: 74.6%. 4 of 5 predictors are statistically significant.

Key Findings:
• Model discrimination: AUC = 0.8 (good — model reliably separates the two classes)
• Sensitivity: 52.3% — proportion of 'completed' cases correctly identified
• Specificity: 87% — proportion of 'none' cases correctly identified
• McFadden R² = 0.172 (weak model fit — consider adding predictors)

Recommendation: The model provides useful discrimination. Use an optimal threshold of 0.367 to classify new cases. Focus interventions on predictors with large, significant odds ratios.

Interpretation

EXECUTIVE SUMMARY

Purpose

This section synthesizes the logistic regression model's performance in predicting test preparation completion across 1,000 students. Understanding whether the model achieves sufficient predictive accuracy and identifies actionable student characteristics is critical for determining deployment viability and intervention strategy effectiveness.

Key Findings

  • AUC (0.8): Model demonstrates good discrimination ability—reliably separates students who completed test prep from those who did not
  • Accuracy (74.6%): Overall correct classification rate, though masked by class imbalance (35.8% positive cases)
  • Sensitivity (52.3%): Captures only about half of students who actually completed prep; high false-negative risk
  • Specificity (87%): Excellent at identifying non-completers, reducing false-positive interventions
  • Significant Predictors (4 of 5): Student gender, writing score, math score, and reading score drive predictions; lunch plan is non-significant
  • McFadden R² (0.172): Weak explanatory power suggests unmeasured factors influence completion behavior

Interpretation

The model achieves the business objective of identifying predictive characteristics with acceptable discrimination (AUC 0.8). However, the low sensitivity reveals a critical trade-off: while the model excels at identifying non-completers

Figure 4

ROC Curve

Sensitivity vs. Specificity Tradeoff

ROC curve showing model discrimination ability

Interpretation

Purpose

The ROC curve evaluates the logistic regression model's ability to discriminate between students who completed test preparation and those who did not across all possible classification thresholds. This section directly addresses the model's predictive quality for the stated objective of identifying student characteristics that predict test preparation completion.

Key Findings

  • AUC (0.8): Indicates good discrimination ability—the model correctly ranks a randomly selected completer higher than a non-completer 80% of the time, substantially better than random guessing (0.5).
  • Optimal Threshold (0.367): Balances sensitivity and specificity by maximizing Youden's J statistic, suggesting predictions below this probability should be classified as "none" and above as "completed."
  • Sensitivity-Specificity Trade-off: The curve shows the model achieves ~52% sensitivity (true positive rate) at ~13% false positive rate, reflecting the class imbalance (35.8% positive cases).

Interpretation

The AUC of 0.8 demonstrates the model has meaningful predictive power for test preparation completion. The model performs substantially better than chance, validating that the selected student characteristics (gender, lunch plan, and test scores) contain discriminative information. However, the moderate AUC reflects inherent complexity in predicting behavioral outcomes and suggests room for improvement through additional

Figure 5

Confusion Matrix

Classification Accuracy by Class

Confusion matrix showing classification accuracy by class

Interpretation

Purpose

This confusion matrix evaluates how well the logistic regression model predicts test preparation completion across the two outcome classes. It reveals the model's ability to correctly identify students who completed preparation versus those who did not, which directly addresses the core objective of identifying predictive student characteristics.

Key Findings

  • Accuracy (74.6%): Overall correctness across both classes; the model correctly classifies nearly 3 in 4 students
  • Sensitivity (52.3%): The model identifies only about half of students who actually completed preparation, missing 48% of true completers (51 false negatives)
  • Specificity (87%): Strong performance identifying non-completers; correctly classifies 87% of students who did not complete
  • F1 Score (0.596): Moderate balance between precision and recall, reflecting the trade-off between false positives (25) and false negatives (51)

Interpretation

The model demonstrates asymmetric performance: it excels at identifying non-completion but struggles with completion detection. The high specificity (87%) indicates the model conservatively predicts completion, resulting in many false negatives. This imbalance reflects the class distribution (35.8% positive cases) and suggests the model's decision boundary favors the majority class. The moderate F1 score indicates reasonable but imperfect predictive utility for

Figure 6

Odds Ratios

Predictor Effects with 95% Confidence Intervals

Odds ratios with 95% confidence intervals for all predictors

Interpretation

Purpose

This section quantifies the individual effect of each predictor on the odds of test preparation completion. The odds ratios and confidence intervals reveal which student characteristics are statistically reliable predictors and the magnitude of their influence on completion likelihood. This directly supports the analysis objective to identify which characteristics predict test preparation completion.

Key Findings

  • Student Gender (Male): OR = 7.46 (95% CI: 4.3–13.25) - Male students have 7.5 times higher odds of completing test preparation; highly significant (p<0.001) with a confidence interval far above 1.0
  • Writing Score: OR = 1.26 (95% CI: 1.2–1.33) - Each unit increase in writing score increases completion odds by 26%; statistically significant (p<0.001)
  • Math & Reading Scores: OR ≈ 0.91–0.92 - Both decrease completion odds by ~8–9% per unit; significant protective effects (p<0.001)
  • Lunch Plan: OR = 0.82 (95% CI: 0.55–1.2) - Not statistically significant (p=0.305); confidence interval crosses 1.0, indicating no reliable effect

Interpretation

Four of five predictors significantly influence completion odds.

Figure 7

Predicted Probability Distribution

Class Separation Quality

Distribution of predicted probabilities by actual class

Interpretation

Purpose

This section visualizes how well the logistic regression model separates students who completed test preparation from those who did not. The distribution of predicted probabilities reveals the model's confidence in its classifications and identifies overlap regions where the model struggles to distinguish between classes. This directly supports the objective of identifying student characteristics that predict test preparation completion.

Key Findings

  • Positive Class Percentage: 35.8% of students completed test preparation, creating moderate class imbalance that affects model calibration
  • Classification Threshold: Set at 0.367, below the 50% default, reflecting the class imbalance and optimizing for balanced sensitivity/specificity
  • Predicted Probability Range: Mean of 0.36 with standard deviation of 0.22 indicates moderate spread; skewness of 0.45 suggests slight right-skew toward higher probabilities
  • Class Separation: Moderate overlap between distributions suggests the model achieves reasonable but imperfect discrimination between completed and non-completed cases

Interpretation

The predicted probabilities show meaningful separation between the two classes, consistent with the model's AUC of 0.80. The threshold of 0.367 is optimized below 0.50 because only 35.8% of students completed preparation, allowing the model to balance false positives and false negatives. The observed overlap explains why sensitivity (0

Table 8

Model Coefficients

Log-Odds, Odds Ratios & Significance

Full coefficient table with log-odds, odds ratios, and significance

variablelog_oddsstd_errorz_statp_valueodds_ratioci_lowerci_uppersignificant
(Intercept)-5.0320.5788-8.69400.0070.0020.02Yes
math score-0.08820.0158-5.59500.9160.8870.944Yes
reading score-0.09460.0218-4.33400.910.8710.949Yes
writing score0.23220.02499.30901.2611.2031.326Yes
student_gendermale2.0090.2877.00107.4574.29513.25Yes
lunch_planstandard-0.20380.1986-1.0260.30480.8160.5521.204No

Interpretation

Purpose

This section quantifies the relationship between each student characteristic and test preparation completion. The coefficient table reveals which factors statistically predict completion and the magnitude of their effects, directly addressing the research objective to identify predictive student characteristics through logistic regression.

Key Findings

  • Student Gender (Male): Odds ratio of 7.46 (p<0.001) — male students have 7.5× higher odds of completing test preparation than females, the strongest predictor in the model
  • Writing Score: Odds ratio of 1.26 (p<0.001) — each additional point increases completion odds by 26%, the only positive academic predictor
  • Math & Reading Scores: Odds ratios of 0.92 each (p<0.001) — counterintuitively, higher scores decrease completion odds by 8% per point, suggesting high-performing students may skip preparation
  • Lunch Plan: Odds ratio of 0.82 (p=0.305) — not statistically significant; socioeconomic status shows no meaningful effect
  • Model Fit: McFadden's R² = 0.172 indicates modest explanatory power; the model explains 17% of variance in completion

Interpretation

The model identifies gender as the dominant predictor of completion, with writing ability as a secondary factor. The inverse relationship

Want to run this analysis on your own data? Upload CSV — Free Analysis See Pricing