Architecture

Where the gateway sits

IntentGate is an inline gateway. Agent runtimes (the process running the LLM and orchestrating tool calls) connect to IntentGate over MCP or JSON-RPC instead of connecting directly to internal tools. IntentGate evaluates each tool call against the four-check authorization pipeline and either forwards the call to the upstream tool server or returns a typed error.

Nothing in the agent runtime, the LLM, or the application code needs to know that IntentGate is in the path beyond changing one URL. The protocol on both sides is unchanged. There is no SDK in the LLM context, no extension to the model, no change to how the agent is built. The control point is purely a network hop.

     ┌──────────────────┐        ┌──────────────────┐        ┌──────────────────┐
     │  Agent runtime   │  MCP   │   IntentGate     │  MCP   │  Upstream tool   │
     │  (LangChain,     │ ─────▶ │     gateway      │ ─────▶ │  servers         │
     │   OpenAI Assist, │        │   (this binary)  │        │  (Salesforce,    │
     │   custom)        │ ◀───── │                  │ ◀───── │   DB, S3, ...)   │
     └──────────────────┘        └────────┬─────────┘        └──────────────────┘
                                          │
                                          │ writes / reads
                                          ▼
                                  ┌───────────────┐
                                  │   Postgres    │
                                  │  (audit,      │
                                  │   drafts,     │
                                  │   revocation, │
                                  │   approvals)  │
                                  └───────────────┘

Components

Gateway binary (intentgate-gateway). Go binary, single static executable, no runtime dependencies. Runs as a Kubernetes Deployment, a systemd unit, a container, or a bare process. Stateless on its own; all persistent state lives in Postgres. Multiple replicas behind a load balancer are the default deployment.

Policy bundle. A directory of Rego files plus a manifest. Loaded into the embedded OPA evaluator at start, hot-reloaded on policy promotion. Customers ship their own bundle; the open-source repo includes a baseline that demonstrates the four canonical patterns (destructive-verb deny-list, bulk-row ceiling, value threshold, approved destination).

Postgres. The single backing store. Holds the audit chain, draft policies, the active-policy pointer, capability revocations, approval queue, JIT elevation records, and tenant configuration. Schema migrations run at gateway start. No other database is required.

Console-Pro. Separate Astro+React web application talking to the gateway's admin API. Provides tenant switching, audit query and export, approval review, policy promotion UI, JIT elevation request flow. Optional — every action it exposes is reachable via REST. Commercial.

SDKs. Pure-Python and TypeScript libraries that build capability tokens, attenuate them for sub-agent delegation, sign requests, and parse typed errors. Used by the agent-runtime side, not by the gateway. Apache 2.0.

Extractor. A standalone CLI for redacting and exporting audit slices for offline review or auditor handoff. Apache 2.0.

Request lifecycle

A single agent tool call traverses the gateway in five phases. Each phase has a deterministic outcome — allow, deny, or escalate — and every outcome is appended to the audit chain before the next phase runs. If any phase denies, the upstream tool server is never contacted.

1. Capability check. The agent presents a capability token in the Authorization header. The gateway validates the signature, expiry, JTI freshness against the revocation table, tenant claim, and scope against the requested tool. Failure returns −32010.

2. Intent check. The token's intent payload (what the user authorized) is matched against the resolved tool call (what the agent is attempting). Mismatches on verb, target, or scale return −32011. This is the check that blocks an agent from doing something the user did not ask for, even when the agent holds a valid token for some other action.

3. Policy check. The resolved call is evaluated against the loaded Rego bundle. Destructive-verb deny-list, bulk-row ceiling, value threshold, approved-destination list, and any custom rules run here. Failure returns −32012; if a rule sets escalate=true, the call is parked in the approval queue and returns −32014.

4. Budget check. The call is debited against the agent's per-window and per-tenant ceilings (cost, row count, tool-call count). Exceeded ceilings return −32013. Budgets defeat slow-drip exfil and runaway-loop scenarios that individually pass policy.

5. Forward + audit. If all four checks pass, the gateway forwards the call to the upstream tool server, captures the response, and appends an audit event with the resolved call, the decision, the policy hash, the budget remaining, and a hash linking to the previous audit event in the chain.

The audit chain

Every decision (allow, deny, escalate, elevation grant, policy promotion, approval review) is written to the audit_events table with a SHA-256 hash that incorporates the previous event's hash. Per tenant the events form a tamper-evident chain: if an attacker with database write access modifies or removes a past event, the chain head no longer reconciles when the gateway recomputes it.

Operators verify the chain by calling GET /v1/admin/audit/verify, which walks every event for the tenant and reports the current chain head, the chain-head timestamp, and any reconciliation failures. The verify endpoint runs against a read replica without impacting hot-path latency, and is the basis of the daily compliance attestation that BIO, ISO 27001, and EU AI Act auditors ask for.

Export uses GET /v1/admin/audit/export and streams CSV or NDJSON for auditor handoff. Argument-value redaction (gateway v1.3+) is opt-in per policy and preserves the structural shape of the event so dry-run replay can re-evaluate redacted history against new policy without leaking the raw values.

Multi-tenant model

From gateway v1.0 every /v1/admin/* endpoint is tenant-scoped. An admin token carries a tenant claim, and the gateway refuses any cross-tenant operation. Policies, approvals, elevations, revocations, audit events, and budgets all partition by tenant. A single gateway deployment serves any number of tenants without code or configuration changes per tenant.

Per-tenant active policies (gateway v1.5+) mean each tenant independently promotes their own baseline. Tenant A can run a strict baseline while Tenant B is in dry-run mode evaluating a new rule. Policy promotion on one replica fans out to every replica via Postgres LISTEN/NOTIFY, so a deploy from the console takes effect everywhere within milliseconds.

Deployment topology

Single-tenant pilot. One gateway replica, one Postgres, the console on a sibling pod. Sufficient for the first 30 days of evaluation. The full install fits in a single Kubernetes namespace and is the topology the Helm chart's defaults produce.

Production single-region. Three gateway replicas behind a Layer-4 load balancer, one Postgres primary with one read replica, the console on its own deployment. Capability validation, intent matching, and policy evaluation are local to each replica; audit writes go to the primary; chain verification reads from the replica. Sustains thousands of tool calls per second per gateway pod on standard cloud instances.

Multi-region. Each region runs its own gateway and Postgres. Capability tokens are signed by a regional key referenced in the token header; the gateway accepts tokens signed by any configured region's key so an agent can be served by the closest replica without a token re-issue. Audit is regional; export consolidates across regions at query time.

Trust boundaries

Three boundaries matter to operators.

Agent runtime → gateway. The capability token is the only credential the agent presents. The token is signed by the gateway's signing key; the agent cannot forge or escalate. The gateway never trusts a header set by the agent claiming a role, tenant, or scope — those values are read from the verified token payload only.

Gateway → upstream tool server. Service-to-service credentials live in the gateway's secret store (Kubernetes secrets, AWS Secrets Manager, HashiCorp Vault). The agent never sees an upstream credential and cannot ask for one. Credential rotation is a gateway operation, transparent to the agent.

Gateway → Postgres. Standard Postgres TLS plus row-level isolation by tenant ID. The gateway pod's database role has CRUD on its own tables and no access to anything else. Audit hash chaining means even a database compromise that bypasses application controls is detectable on the next chain verification.

What the gateway does not do

The gateway is not an identity provider. Users authenticate to whatever IdP the organization already uses (Okta, Entra, Keycloak, Auth0); the gateway accepts a capability token issued downstream of that authentication.

The gateway does not run inference. It never sees model weights, never makes an LLM call, and never charges for tokens. It runs on commodity CPU instances and scales horizontally without a GPU footprint.

The gateway does not store agent conversations or model context. Audit events record what tool calls happened and what the gateway decided, not the prompts that produced the calls. Conversational data stays in the agent runtime and the LLM provider.

The gateway does not replace network egress controls, secrets management, IGA, PAM, or perimeter authentication. It sits beside them and supplies the layer none of them cover: per-tool-call authorization for AI agents.

See the Agent Runtime Authorization category page for the conceptual model, the six deep-dive control pages under Resources for each check, the deployment runbook for the day-1 install, the API reference for the HTTP surface, and the AWS + Microsoft Sentinel integration recipe for landing audit in a common enterprise SIEM topology.