A delivered engagement
The fund was running about $24M a year through large language models and could not say what the spend returned. In just over a week we removed $8.7M from the annual bill, with no research workload dropping below the quality floor the fund's own researchers set, and the desk ended up running more than twice the effective research per dollar. Here it is lever by lever, measured against the fund's own evals and signed off.
The diagnosis
The fund put roughly $24M a year through large language models with no clean answer to what it returned. Twenty-two percent of that spend could not be attributed to any team or research process: a desk that ties every basis point of P&L to a strategy could not attribute a fifth of its AI spend to a research purpose. Fifty-eight percent of token volume ran on the top reasoning-grade model regardless of task. The same context was re-sent uncached on nearly every call, a 6% cache-hit rate. An agentic research loop had been stuck retrying a failed data fetch for weeks, producing nothing at a roughly seven-figure annualized rate. That puts the fund at 26 out of 100, the Tokenmaxxer band: spend running with no one accountable for what it returns. Because waste like that buys no alpha, removing it cost the fund nothing it valued.
Where the $8.7M came from
Why this matters more than the bill
For an alpha shop the figure that matters is not the bill but the cost per validated signal: the all-in inference cost to carry one hypothesis from idea, through backtest and evaluation, to a signal cleared for a live book. It fell about 57%. Two forces drove it. Every stage of the pipeline got cheaper through routing, caching, and the distilled pre-screen. And the compute that used to die in looping agents and abandoned pilots left the denominator, because we switched it off. Before the engagement only about 22% of AI spend was tied to research that reached a live signal; afterward it was roughly 80%. Combined with a cost base a third lower, effective research per AI dollar more than doubled, and on the flagship workload the team backtested far more ideas for the same money.
How a fund lets you book it
Co-defined in writing on the fund's own usage data before any work begins, then frozen, with one-off anomalies normalized out. We never touch the ledger.
Unit-cost reduction by default; frozen run-rate for the one-time eliminations; A/B holdout for the quality-sensitive levers like routing and distillation, so the counterfactual is measured directly rather than argued. A fund understands a holdout better than anyone.
The eval pass rates and signal-quality metrics, defined and owned by the fund's researchers, must hold or improve, or the affected saving does not count. Cutting cost by letting research quality slip would count for nothing.
A fixed audit fee plus a minority share of the savings the fund's finance and research leads sign off each month. Provider price cuts and organic volume are carved out, so we are paid only for the savings we actually caused.
Run the free Token Audit, or talk to us about your estate.