The open, versioned protocol behind every Maximand engagement. It defines what a conformant diagnosis measures, how a saving is verified, and the claims that would prove the method wrong. It is falsifiable, normative, and certifiable, and we publish it in full so a prospective client, a researcher, or a competitor can read it, check it, and cite it.
The Token-Efficiency Standard (TES) is the normative specification for diagnosing and reducing an enterprise's generative-AI cost while holding output quality. It governs the delivery method: the taxonomy of waste, the scoring, the verification, and the conformance levels a result must meet before it can be called verified. Conformance keywords (MUST, MUST NOT, SHOULD, MAY) are used in the usual sense.
In scope: metered LLM inference and the per-seat AI subscriptions adjacent to it, for an enterprise buyer. Non-goals: the TES does not rank model vendors, does not promise a fixed savings percentage, and does not claim to improve model quality beyond holding a pre-agreed floor. It is a cost-and-yield protocol, not a model-evaluation one.
A standard that cannot be wrong is not a standard. The TES rests on four claims, each stated so it can be disproven. The fourth is a self-correcting loop: the method is wired to be revised by its own accumulating evidence.
A conformant diagnosis MUST assess all twelve levers and MUST NOT rename or omit them. They are worked in tier order, because a saving you cannot attribute or contain cannot be trusted.
Each lever's definition, applicable-share default, savings range, and weight are fixed parameters. The full scoring model, with the benchmark behind every range and a worked example, is published at the scoring model.
Adoption per lever is scored from evidence (no = 0, partial = 0.5, yes = 1.0). A claimed "yes" the data contradicts MUST be scored lower.
Bands: under 30 Tokenmaxxer (unmanaged); 30 to 49 Reactive; 50 to 69 Managed; 70 to 84 Disciplined; 85 to 100 Yield-optimised. Recoverable spend MUST use the overlap-adjusted compounded fraction, capped at 0.65, so overlapping levers are never double-counted and the model cannot claim an implausible total.
This is the part that lets a client book the number, and it is where most cost-savings claims fail.
MUST co-define the baseline in writing on the client's own invoices and observability, before any optimization, then freeze it. Anomalies are normalized out.
MUST measure savings as a reduction in cost per unit of work by default, so volume growth neither creates nor destroys a fee. Permitted alternatives: frozen run-rate for one-time eliminations, and A/B holdout (the strongest) for quality-sensitive levers.
MUST hold a per-workload quality floor agreed at baseline. A period's savings are creditable only while in-scope quality stays at or above it.
MUST carve out provider price cuts, organic volume changes, post-baseline use cases, and client-led changes.
MUST be computed reproducibly on the client's own ledger and signed off by the client each period before any fee. Each verified saving is credited for twelve months, then rolls into the client's baseline at no further fee.
Marketing or case studies MUST distinguish a diagnostic estimate from a verified result.
Competence is defined in levels so a trained practitioner, not only the original author, can deliver an engagement to the same standard. This is what removes key-person risk and lets the method scale beyond any one person.
The TES is a living standard, versioned with semver. Engagements record the version they were delivered under. The calibration loop in C4 is the formal amendment mechanism: when the accumulated evidence shows a lever's measured rates persistently diverging from its benchmark, a new version MUST update that benchmark, with the change and its evidence recorded. We intend the Standard to be open and citable, on the model of the FinOps Foundation, where being in the open is part of why it can be trusted. The method is published; the moat is the execution under a client's constraints, the verification their finance and risk teams will sign, and the benchmark that grows with every engagement.