AIProductBuild in public

The AI feature in our AI product doesn't use AI

BurnCap's Insights feature is the smart part of an AI cost tool — and it calls zero models. Why the most AI-native decision we made was choosing not to use AI.

Jun 17, 2026Daniel4 min readBurnCap ↗

BurnCap's dashboard insights — what-if model swaps, top movers, and prompt-cache efficiency — each showing its deterministic "nothing worth flagging" state rather than a guess.

When you ship something with "AI" in the description, there's an unspoken assumption that somewhere inside, a model is doing the clever part. BurnCap is an AI cost tool. Its Insights feature is the part that's supposed to be smart — it tells you where your money is going and what to do about it.

It does not call a single LLM to do that. Not one token. On purpose.

If you've read the rest of this series, you'll recognize the move: most of BurnCap is decided by what it refuses to do. This is one more refusal, and the reasoning is the point.

The lazy version writes itself

The obvious way to build "AI insights" is three steps: dump the user's spend data into a prompt, ask a model "what should this person do to cut costs?", and render the answer in a nice card. It demos beautifully. You can ship it in an afternoon.

It's also exactly the wrong tool for this particular job — and it took about five minutes of thinking about what the feature has to be true to know it.

A cost report's one job is to be trusted

BurnCap's entire positioning is honest numbers. Every figure is labeled — estimated, provider-billed, or forecasted. When we don't have a price for a model, the product says "unpriced" instead of quietly guessing. Cost bases never get silently mixed. That discipline is the product; it's the reason someone would trust it over eyeballing their provider dashboard.

Now put an LLM in the middle of that. An LLM's core competency is producing a plausible answer whether or not it actually knows. That's wonderful for brainstorming and a disaster for a cost report. The first time an "insight" confidently tells you you'll save $400 by doing something that saves nothing, we've lost the only thing the product sells. There's no label honest enough to survive a hallucinated dollar figure.

There are smaller reasons too — latency, reproducibility, and the genuine irony of a cost tool burning tokens to generate advice about burning tokens. But the big one is trust. A number you can't reproduce is a number you can't act on.

So what is the "intelligence," then?

Three deterministic calculations. None is clever. All are things a careful analyst would do in a spreadsheet — just done automatically, and labeled honestly.

Cache efficiency. For each model: what fraction of your input tokens hit the prompt cache, and how many real dollars did that save versus paying full input price? It's arithmetic — cached_tokens × (input_rate − cached_rate). No model needs to tell you to "consider prompt caching"; it shows you the money you already saved and the money you're leaving on the table.
What-if model swaps. Take this month's actual token mix, re-price it on a cheaper model from the same provider at today's rates, and if it would save more than 20% and more than a dollar, surface it — labeled for exactly what it is: same token mix, same provider, quality not considered. A candidate to evaluate, not an instruction.
Top movers. Last seven complete days versus the seven before, by model, ranked by what moved most. That's how you catch "huh, the agent feature quietly tripled last week" before the invoice tells you.

The part where we got it wrong

Here's a real one from this week. The first version of the swap suggester picked the cheapest eligible model — maximum savings, technically optimal. In practice it told you to run your flagship workload on the tiniest, oldest model in the catalog, because that's where the biggest number lives. "Save 87%!" by switching your reasoning agent to something that can't do the job. That's not advice; it's a math trick wearing a suit.

So we changed it to suggest the next tier down — the closest cheaper model in that provider's current lineup. Flagship to the step below it, not flagship to the floor. Smaller headline number, but an actual decision a human might make.

Notice what the "intelligence" upgrade actually was: we looked at the output, saw it was useless, and encoded better judgment. No model involved. That's what most of these features are — human judgment, made repeatable. An LLM would have been a way to avoid deciding what good advice looks like, by outsourcing the decision to something that doesn't know our users.

The principle

"AI product" has quietly come to mean "product with an LLM call in it somewhere." But that's never the useful question. The useful question is: what does this feature need to be true? This one needed to be correct, reproducible, and honest about its own uncertainty — and those are not an LLM's strengths.

So the model stays out of the hot path — the same refusal behind never proxying your traffic and never storing your prompts. The whole product is built from a handful of these refusals: out of your traffic, out of your prompts, out of guessing, and, here, out of the math.

“The most AI-native decision we made on this product was choosing not to use AI for the part everyone assumes is the AI part.”

See it, or argue with us

You can watch all three insights run on demo data in about thirty seconds — no account wiring, no API key. If you're building something where a confident-but-wrong answer is worse than no answer, we'd genuinely like to compare notes on where you drew the line.