Budget Tracking

Per-tenant or per-session ceiling on token consumption, cost, or call volume, enforced fail-closed by the gateway. Budget tracking is the defence against unbounded consumption (OWASP LLM10) — runaway agent loops or token-bomb upstreams that exhaust LLM credits, downstream system capacity, or wallet. It is the fourth check in the IntentGate authorization pipeline.

Why budget tracking matters

Agentic systems differ from traditional applications in their resource-consumption profile. A traditional API call costs a fixed compute slice; an agent tool call may trigger an LLM round trip costing anywhere from a fraction of a cent to several dollars depending on context size and model choice. An agent stuck in a reasoning loop, or one that has been weaponised by an attacker to consume tokens, can burn through a month's LLM budget in hours. The Anthropic, OpenAI, and Google bills land at the end of the month; the budget exhaustion lands now.

The OWASP LLM10 (Unbounded Consumption) risk class captures this surface. Mitigations include client-side rate limits, provider-side quotas, and prompt-length caps — all of which are necessary but none of which are sufficient because they operate at the wrong granularity (the call, not the agent or the tenant). Budget tracking operates at the granularity that matters for cost: per-tenant, per-session, per-agent.

How IntentGate implements budget tracking

The gateway maintains a budget counter for each tracked granularity. On every tool call, the gateway looks up the relevant counters, evaluates whether the projected post-call values would exceed any configured ceiling, and refuses the call if so. The decision is made before the call is forwarded to the LLM provider or the downstream tool — so a budget-exceeded call has no LLM cost, no downstream side effect, and no half-completed transaction.

IntentGate tracks three meters concurrently:

Token consumption. Input and output token counts across LLM provider calls, accumulated per tenant, per session, per agent. The gateway captures token counts from the provider's response metadata and adds them to the relevant counters. Token-based budgets are the right granularity for cost-sensitive deployments and for cost attribution.

Call volume. Raw count of tool calls made, per tenant, per session, per agent. Useful for capacity-sensitive deployments where the bottleneck is downstream system throughput rather than LLM spend.

Direct cost. The gateway maintains a price catalogue for each LLM provider and each tool, and accumulates the per-call cost into a currency-denominated budget. This is the most generic meter: a single ceiling in euros applies regardless of which LLM provider or tool is involved.

Fail-closed semantics

IntentGate budgets are fail-closed by design: when a counter exceeds a ceiling, the next call is refused outright rather than throttled. Throttling slows down requests but still allows them to complete eventually — useful for protecting downstream capacity but not for the budget exhaustion case, where allowing completion means the customer is billed for the next chunk regardless. Refusing the call ensures the agent cannot continue to consume resources and the operator gets a clear signal that the ceiling was hit.

For deployments where partial throttling is preferable to outright refusal, operators can configure a soft-ceiling that triggers a warning event but still allows the call to proceed, paired with a hard-ceiling that fails closed. The two-tier configuration gives operators time to react before the hard refusal.

Per-tenant isolation

Multi-tenant deployments require budget isolation across tenants: one tenant's runaway agent must not exhaust another tenant's budget. IntentGate's per-tenant counters are isolated by design — they share neither the counter, the ceiling, nor the refusal decision. A tenant whose agent triggers a budget refusal does not see degraded service for other tenants, and vice versa. Combined with the capability token's tenant scope, this means the financial blast radius of a hijacked agent is bounded by the tenant ceiling, regardless of how many tools the agent attempts.

Error code and observability

Budget failures return JSON-RPC error code −32013. The error payload includes which meter exceeded (tokens, calls, currency), which granularity (tenant, session, agent), and the current counter value relative to the ceiling. SIEM adapters route on the error code; the gateway also exposes a Prometheus-compatible metrics endpoint with per-tenant counter values for dashboard observability.

Related controls

Budget tracking runs after the policy engine. It complements intent enforcement (which stops out-of-intent calls before they consume any budget) and capability tokens (which bound which agents can spend against which tenant's budget). See the Agent Runtime Authorization category page for the full pipeline.

Frequently asked questions

Why fail-closed instead of throttling?

Throttling slows down requests but still allows them to complete eventually — useful for protecting downstream capacity but not useful when the problem is cost exhaustion. Fail-closed refuses the call outright once the budget is exceeded, ensuring the customer is not billed for unexpected volume and that the agent cannot continue consuming resources. For security-relevant budgets, fail-closed is the right default. A two-tier soft-ceiling-then-hard-ceiling configuration is available for deployments where graceful degradation is preferred.

How granular are IntentGate budgets?

IntentGate tracks budgets at three granularities simultaneously: per-tenant (the customer organization), per-session (one user interaction), and per-agent (one agent identity within a tenant). All three are independently enforced; a call that exceeds any of the three is refused. Operators set ceilings for each granularity based on the role and risk profile of the agent.

What counts toward the budget?

Three meters: token consumption (input + output tokens passed through the LLM provider), call volume (number of tool calls made), and direct cost (the gateway maintains a price catalogue for each LLM provider and each tool, and the budget is denominated in currency). Operators can enforce on any combination.