Expense Anomaly Detection — Catch Unusual Transactions Before They Become Costly

Seventy-nine percent of organizations experienced attempted or actual payments fraud in 2024, according to the AFP Payments Fraud and Control Survey (Rillion, 2025). Duplicate vendor payments, expense report padding, miscoded GL entries, and outright fraud hide in transaction data that no human has time to review line by line. Enterprise companies use dedicated fraud detection software like AppZen or Oversight at $50,000+ per year. Mid-market companies — the ones processing 500 to 5,000 transactions per month — rely on periodic spot checks and hope for the best. This analysis applies machine learning anomaly detection to your expense data, flagging the transactions that deviate most from normal patterns and giving you a ranked shortlist to investigate instead of an ocean of line items.

The Problem With Threshold-Based Rules

Most companies that try to catch unusual expenses use simple rules: flag anything over $10,000, flag any vendor payment within 5 days of a previous payment to the same vendor, flag expense reports that exceed the monthly average by 2x. These rules catch the obvious cases — a $50,000 payment to an unknown vendor will trigger every time. But they miss the subtle ones and generate too many false positives on legitimate transactions.

A duplicate payment of $2,300 to a vendor who normally receives $2,300 monthly looks perfectly normal to a threshold rule. An expense report that bills $180 for dinner every Tuesday night — slightly above the $150 average but never exceeding the $200 threshold — accumulates to $1,500 per quarter in padding that no rule catches. A vendor who gradually increases invoice amounts by 3% per month is invisible until someone audits the full history.

Machine learning anomaly detection works differently. Instead of looking at one dimension at a time (amount, frequency, vendor), it evaluates transactions across all dimensions simultaneously. A payment might look normal on amount alone and normal on frequency alone, but the combination of that amount, to that vendor, on that day of the week, in that department, is unusual relative to the full pattern of normal transactions. The algorithm — called Isolation Forest — finds these multi-dimensional outliers that single-dimension rules miss.

How Isolation Forest Works (Without the Math)

The core idea is intuitive: unusual things are easier to describe than normal things. If you played a game of 20 questions about a transaction, a normal payment ("$2,300, to Office Depot, on the 15th, from the admin department") would take many questions to isolate from the crowd of similar transactions. But an unusual payment ("$2,300, to a vendor we have never paid before, on a Sunday, from the engineering department") can be isolated in just a few questions.

The algorithm builds a forest of random decision trees that split transactions along random dimensions at random thresholds. Transactions that end up in short branches — isolated quickly — get high anomaly scores. Transactions deep in the tree — hard to isolate because they look like everything else — get low scores. The algorithm builds hundreds of these trees and averages the results for stability.

You do not need to tell the model what fraud looks like. You do not need labeled examples of "good" and "bad" transactions. You do not need to configure rules or set thresholds (beyond specifying roughly what percentage of transactions you expect to be anomalous). The algorithm learns what "normal" looks like from your data and flags everything that deviates.

What Kinds of Anomalies It Catches

Duplicate payments — same amount to the same vendor within a short window. The 2025 AFP survey found that checks are involved in 63% of payment fraud cases (Zenwork, 2025), and duplicate check payments are among the most common AP errors.
Expense report padding — an employee whose expense patterns are consistently higher than peers in the same role or department. Not a single large charge, but a pattern of slightly inflated claims.
Ghost vendors — payments to vendors with unusual characteristics: no prior history, payments that only occur once, amounts that do not match any standard service contract.
Miscoded GL entries — a transaction coded to the wrong account shows up as anomalous because its characteristics (amount, timing, department) do not match the typical pattern for that GL code.
Timing anomalies — payments processed on weekends, holidays, or at unusual times of day. Legitimate payments follow business hour patterns; fraudulent ones often do not.
Vendor frequency anomalies — a vendor that usually receives one payment per month suddenly receiving three. Or a vendor that has been dormant for six months suddenly reactivating.

The key advantage over rules is that the model catches combinations that no single rule would flag. A $900 payment is not unusual. A payment to Vendor X is not unusual. But a $900 payment to Vendor X from Department Y on a Saturday — that specific combination might score as highly anomalous because it has never occurred before.

Who This Is For

Controllers, accounts payable managers, and internal auditors at companies with $5M to $100M in revenue — large enough to process meaningful transaction volume (500+ per month) but too small for enterprise fraud detection software. Industries with significant operational spending benefit the most: construction, healthcare, professional services, manufacturing, and hospitality. Companies with distributed purchasing authority — multiple cost centers, field offices, or project-based billing — are particularly vulnerable because no single person sees all transactions.

The current alternative for most of these companies is periodic manual sampling: the controller pulls a random sample of 50 transactions per month and reviews them. This catches roughly 5% of anomalies. Quarterly audit spot-checks are even less effective. The analysis replaces random sampling with targeted investigation — review the 20 most anomalous transactions instead of 50 random ones, and you are far more likely to find actual problems.

What Data You Need

A CSV export from your general ledger, accounts payable system, or expense management platform (QuickBooks, Xero, Expensify, Concur, Brex). You need at least two numeric columns that characterize each transaction:

Transaction amount — the dollar value of each payment or expense
Vendor payment frequency — how often this vendor appears in your data (can be computed from the raw data before upload)

Additional numeric features that significantly improve detection:

Days since last payment to this vendor — captures unusual timing patterns
GL code average deviation — how far this transaction's amount deviates from the historical average for its GL code
Department budget utilization — what percentage of the department's monthly budget this transaction represents
Day of week — encoded as 1-7, captures weekend processing anomalies

The model accepts 2-20 numeric features. More dimensions give it more ways to distinguish anomalies from normal transactions. Categorical columns like vendor name or GL code should be excluded from the feature mapping — the model requires numeric inputs only. Use those columns for investigation after the anomalies are flagged.

Minimum: 200 transactions. The sweet spot is 1,000 to 5,000 rows — enough to establish normal patterns without overwhelming the model. A typical month-end GL export falls squarely in this range.

How to Read the Report

Anomaly Score Distribution — a histogram showing scores across all transactions. Normal transactions cluster at low scores; anomalies sit in the right tail. A clean separation between normal and anomalous scores means the model is confident. A gradual tail with no clear gap means the boundary is fuzzy and you should interpret borderline cases with caution.

Top Anomalies Table — this is your action list. The 20-50 most anomalous transactions ranked by score, with all feature values shown. For each transaction, you can see exactly why it was flagged — unusual amount, unusual timing, unusual vendor frequency, or some combination. Start your investigation here.

Feature Importance — which dimensions drove the anomaly scores most. If "transaction_amount" dominates, your anomalies are primarily unusually large or small payments. If multiple features contribute roughly equally, the anomalies are multi-dimensional — they look unusual across several characteristics simultaneously, which often indicates more sophisticated patterns.

Normal vs. Anomaly Comparison — side-by-side statistics showing how anomalous transactions differ from normal ones. Maybe anomalous transactions are 10x the average amount. Maybe they have normal amounts but come from vendors with unusually low payment frequency. This comparison table makes the differences concrete.

Building a Review Workflow

Monthly Cadence

Run the analysis at month-end on the full month's GL export. Review the top 20 anomalies. Most will have innocent explanations — a legitimate one-time purchase, a vendor who changed billing frequency, a correctly coded but unusual project expense. Flag the ones that do not have obvious explanations for deeper investigation.

Prioritize by Score

Anomaly scores are continuous, not binary. A transaction scoring 0.85 is far more suspicious than one scoring 0.55. Start with the highest scores and work down. You will quickly develop a sense for what score level separates "worth investigating" from "probably fine" in your specific data.

Track False Positives

Keep a log of which flagged transactions turned out to be legitimate. Over time, this helps you tune the contamination parameter — if the model flags 5% of transactions but only 1% are actually problematic, reduce the contamination setting from 5 to 2. This narrows the investigation list and reduces reviewer fatigue.

When to Use Something Else

Want to profile your expense data before running anomaly detection: Use the Data Explorer to understand distributions, identify which columns to use as features, and spot obvious data quality issues.
Need to compare expense levels across departments: Use ANOVA to test whether spending differs significantly across cost centers.
Want to forecast future expenses: Use revenue forecasting applied to expense data to project monthly burn rate.
Only tracking one metric (e.g., total daily spend): A simple Z-score approach is easier and more interpretable for univariate anomaly detection.

References

Accounts Payable Fraud 2025: Stats, Detection and Prevention. Rillion. rillion.com
Accounts Payable Fraud 2025: Prevention and AI Solutions. Zenwork. zenwork.com
Expense Fraud: How to Identify and Prevent It in 2026. Rydoo. rydoo.com
Fraud Busters: Future-Proof Your Fraud Strategy for 2026. Forvis Mazars. forvismazars.us