Executive Summary
Key findings from the hotel cancellation prediction model
Across 10,000 hotel bookings, 63.3% were ultimately canceled. The strongest single predictor of cancellation is Deposit Type: Non Refund (odds ratio 15.786). A logistic regression model trained on booking characteristics achieved 69.4% accuracy and an AUC-ROC of 0.745 on held-out test data, confirming that cancellations can be reliably flagged from booking-time signals alone.
Monthly Cancellation Rate Trend
Month-over-month cancellation rate — Resort Hotel
Cancellation rates fluctuate noticeably across months, revealing seasonal demand patterns for Resort Hotel. Peaks in summer and late-year months reflect periods when travelers book speculatively and then revise plans. Long lead-time bookings placed in high-demand months show elevated cancellation rates compared to near-arrival reservations.
Cancellation Rate by Deposit Type
Which deposit policies are associated with highest cancellation rates
Deposit policy is one of the most powerful signals for cancellation risk. Non Refund bookings have the highest cancellation rate at 95.8%, while Refundable bookings cancel at only 12.7%. Non-refundable deposits create strong financial commitments that deter cancellations, while no-deposit bookings carry minimal risk for the guest. Hotels can reduce cancellation exposure by incentivizing non-refundable rates.
Cancellation Rate by Market Segment
Cancellation rates by booking channel
Different booking channels carry very different cancellation risks. Online TA bookings show the highest cancellation rate at 75.1%. Online travel agencies (OTAs) often have more permissive cancellation windows, which drives higher cancellation rates versus direct bookings. Corporate and group segments tend to be more committed due to contractual obligations and coordinated travel planning.
Lead Time Distribution by Cancellation Status
Lead time (days before arrival) split by whether booking was canceled
Bookings that are eventually canceled are placed much further in advance than those that are honored. The median lead time for canceled bookings is 92 days, compared to 88 days for kept bookings. This large gap confirms that early bookers are more likely to change plans, making lead time one of the strongest and most actionable cancellation signals. Hotels can apply tiered non-refundable policies for bookings beyond a threshold.
Cancellation Rate by Customer Type
Cancellation rates by customer segment
Customer type is a meaningful risk stratifier for cancellations. Group customers have the highest cancellation rate at 71.4%. Contract bookings generally show lower cancellation rates because they involve negotiated terms and coordinated travel, reducing ad-hoc changes. Transient customers, who book on their own without contractual commitments, are the most volatile segment from a revenue management perspective.
Logistic Regression: Odds Ratios for Cancellation
Model coefficients expressed as odds ratios (above 1 = increases cancellation risk)
Odds ratios above 1.0 increase cancellation risk while values below 1.0 reduce it. Non-refundable deposit policies are the dominant driver of cancellation; Group customer type also substantially elevates risk. Longer lead times increase cancellation probability to a lesser degree, while more special requests (engaged guests) meaningfully reduce the odds. These coefficients provide hotel managers with a ranked list of levers to pull when designing cancellation mitigation strategies.
Reserved vs Assigned Room Type Matrix
Percentage of bookings assigned each room type given what was reserved
The heatmap shows what fraction of guests reserving each room type were actually assigned that type or a different one. Overall, 11.6% of bookings (1,159 of 10,000) received a different room than reserved. Room reassignment — particularly downgrades — can increase cancellation risk and reduce guest satisfaction; diagonal cells represent perfect matches.
Model Performance Metrics
Classification accuracy and discrimination statistics on held-out test data
| Value | Metric |
|---|---|
| 69.4 | Accuracy |
| 0.745 | AUC-ROC |
| 75.4 | Sensitivity |
| 58.7 | Specificity |
| 2000 | Test Set Size |
| 8000 | Training Set Size |
The logistic regression model achieves 69.4% accuracy on the held-out test set, meaning it correctly classifies most bookings as canceled or kept. The AUC-ROC of 0.745 confirms strong discrimination — a value above 0.7 indicates the model is substantially better than random guessing. Sensitivity and specificity together show the trade-off between catching true cancellations and avoiding false alarms, which revenue managers can adjust by changing the decision threshold.