Saturday, May 9, 2026

How to Build an AI Business Case Without Making Numbers Up

How to Build an AI Business Case Without Making Numbers Up

Somewhere right now, a leadership team is staring at a vendor slide that says "40% efficiency gains," with no footnote explaining where that number came from. They will build a business case around it. They will present it to their board. Six months later they will wonder why reality looks nothing like the projection.

Forrester warned in its 2026 predictions that "the gap between inflated vendor promises and value delivered is widening." Gartner predicts that over 40% of agentic AI projects will be cancelled by end of 2027, citing escalating costs and unclear business value.

The technology is rarely the bottleneck. What's broken is the way most organisations justify the spend. Business cases get assembled from the outside in, starting with a vendor's claimed efficiency percentage, multiplying by headcount or revenue, and landing on a large number that explains the budget request. The numbers look rigorous on a slide. They have very little to do with how the work actually gets done.

This post describes a methodology that works the other way round, starting with how the work gets done and building the case from there. Every number in the projection ties back to a named person and a named process.

Interview Everyone Affected

Start with conversations. The credible business cases we've helped build are anchored in what we learn talking to the people whose work would change.

Interview every person whose work the AI implementation would touch, not just their managers and not a sample of the team. In a 30-person operations group, that means 30 interviews. In a 7-person leadership team evaluating a strategic tool, it means 7. The cost of asking everyone is calendar time. The cost of asking a sample is that you miss the workarounds, the manual reconciliations, and the "I spend Tuesday mornings fixing what the system got wrong" tasks that never show up in process documentation.

Two questions matter more than any others.

"What do you actually do with your time?" The job description and the process documentation will tell you what should happen. This question is about what actually happens, hour by hour. The reconciliation work, the data re-keying, the manual bridges between systems that exist because the official integration never quite worked, the categorisation tasks that are theoretically automated but in practice need human review.

"If you got those hours back, what would you do with them?" This turns abstract time savings into concrete business value. People don't answer with "nothing." They answer with specific, useful work they currently cannot get to: training junior staff, developing new products, building client relationships, tackling projects that have been on the back burner for months.

Document what people say in their own words rather than your paraphrase. The transcripts become your evidence base for everything downstream.

Get Ranges and Assign Confidence

When someone tells you a task takes "about 10 hours a week," that is useful but not enough. Push for the range.

"Is it always 10? What's a light week? What about month-end, or when the seasonal spike hits?"

A range like "one to two days a week on manual data bridging" or "50 to 80 hours a year on invoice classification" is more honest than a single point estimate, and it gives you the raw material for both conservative and optimistic projections without inventing anything.

Collect ranges for:

  • Time spent on the task today. Hours per week or per year, with the floor and the ceiling.
  • Error rates or rework. How often does it go wrong, and what does fixing it cost?
  • Downstream costs. Late invoices, missed logistics windows, compliance penalties, customer churn from slow responses.

Every number should have a name attached to it. "Finance Manager estimates 50-80 hours/year on invoice classification" is a defensible data point. "Invoice processing takes approximately 65 hours/year" is a number with nobody behind it.

Then mark each line item with a confidence level:

LevelCriteriaExample
HighMultiple sources, consistent specifics, ideally cross-referenced against system dataThree team members independently estimate 2-3 hours/day on the same manual process; ticket-system logs corroborate the volume
MediumOne credible source; mechanism makes sense; only a single data pointAn ops manager estimates better route planning would save $15,000-$30,000 per year, based on their experience of current inefficiencies
LowMechanism is clear; magnitude depends on data you don't haveReducing customer response times should improve retention, but you have no churn data correlated to response speed

Mark the confidence level on every line. CFOs we've shown this format to rarely push back on the explicit uncertainty. If anything, it makes the conversation easier, because they're being treated as someone who can hold a range rather than someone being sold a single number.

Build Three Bands

With ranges and confidence levels in hand, construct three projection scenarios.

Conservative. Include only high-confidence items at the low end of their ranges. This is the floor: the minimum defensible value you can expect if everything goes modestly. Numbers in this band should be the ones the team can stand behind in a board meeting without caveats.

Moderate. Include high and medium-confidence items, using the midpoints of the ranges. This is the realistic planning scenario: what you should expect if implementation goes well and adoption is reasonable.

Optimistic. Include all items, including low-confidence ones, at the high end of their ranges. This is the ceiling: what becomes possible if everything aligns, if the data confirms your hypotheses, and if adoption is strong. Revenue protection, market expansion, and strategic advantages live in this band.

Present all three to the decision-maker. Do not show only the optimistic number, and do not average the three into a single figure. The spread between conservative and optimistic is the most important signal in the case, because it tells the reader how much real uncertainty exists.

Name Your Assumptions and Gaps

Every projection rests on assumptions. Most business cases bury them; a credible one lists them where the reader can see them.

Common assumptions worth naming explicitly:

  • Adoption rate. What percentage of the team will actually use the tool effectively, and how quickly? Discount your year-one projections heavily; adoption takes longer than the procurement case wants it to. A case that assumes 100% adoption from day one is fiction.
  • Data readiness. Does the AI solution need clean, structured data you don't currently have? If so, the cost of preparing that data belongs in the investment, not as an afterthought.
  • Process stability. Are the processes you're automating likely to change in the next 12 months? If a reorganisation is coming, your time-savings estimates may not survive it.
  • Integration scope. How many existing systems does the tool need to connect to? Every integration point is a cost and a risk.

For each assumption, note what would need to be true for the projection to hold, and what would invalidate it.

Then list what you don't know. You will have gaps after the interviews are done, and the more honest the documentation, the longer the list will be. Own them:

"We could not obtain detailed churn data correlated to response times. The revenue protection estimate in the optimistic band depends on this data."

"Transport cost savings are based on one person's estimate. Validating against actual route data and fuel costs would sharpen this range."

A listed gap is useful information for the decision-maker. It tells them exactly what additional data would tighten the projection, and it gives them the option to invest in finding that data before committing to full implementation. It also protects you: if a projection misses, you flagged the uncertainty in advance.

Frame the Value as Reallocation

More business cases die over framing than over numbers. The specific framing problem is presenting AI value as headcount reduction.

If your case reads as "we can cut 3 FTEs," you will face resistance from every direction. Operators see a threat to their jobs. Middle managers sandbag adoption because their team size is on the line. The CFO hesitates too, because headcount reduction carries reorganisation costs, morale impact, and PR exposure that a simple cost saving doesn't capture.

Frame the value as time reallocation. That is the most accurate description of what happens in practice anyway.

When you asked what people would do with recovered hours, they told you: train new staff, develop products, build client relationships, tackle strategic projects that have sat on the back burner. Those answers are the real business case. The 800 hours of manual work weren't producing value in the first place; redirecting that capacity is what changes the operation.

"Nobody is being replaced. The value is in what people do with recovered time." That sentence should appear somewhere in your executive summary.

When the Vendor's Case Study Comes Up

The CFO will reasonably ask: the vendor's case study shows 40% improvement at a company like ours, why isn't that the number we're using?

Vendor case studies measure the vendor's best result, not the median, and they almost never disclose adoption rate, integration cost, or the headcount-to-AI ratio that produced the figure. The 40% in the case study is what's possible under conditions the vendor curated. It isn't your planning number.

Use vendor case studies as upper-bound evidence. Cite them in the optimistic band, with the caveat that the case study represents conditions you haven't yet verified in your own environment. If your moderate band lands at 18% and the vendor's case study claims 40%, the gap is interesting information. It points at what the case study customer had that you don't, and at what it would cost to close that distance.

That's a more useful conversation than averaging the two numbers.

When This Method Is Overkill

Thirty interviews is two to three weeks of someone's calendar. That cost is worth paying for board-level decisions, irreversible commitments, or any AI investment above roughly $50,000 in implementation plus first-year run cost.

For low-stakes pilots, off-the-shelf tools under about $10,000, or anything you can revert within a sprint, the full method is overkill. A scaled-down version covers it: four or five interviews with the most-affected users, no confidence-banding, and a single estimate rather than three projections.

The methodology scales with the consequence of being wrong. The reason to interview everyone in a 30-person team is that you are about to change how 30 people work, and the cost of being wrong about that lands on every one of them.

What This Produces

This is slower than copying vendor claims into a slide deck. It requires interviews, careful documentation, and the discipline to write "we don't know" next to the things you don't know. What it produces is a case a CFO can sign off without quietly hedging, that operators recognise as a true description of their work, and that survives contact with the implementation.

Related Reading