Bidirectional PII Filtering

A content-inspection control that detects and strips personally identifiable information from prompts going outbound to LLM providers and from responses coming back inbound. The IntentGate filter detects eighteen built-in PII classes (email, phone, IBAN, BSN, named-entity person names, payment card numbers, etc.) on both directions of traffic and writes a counts-only audit log so matched values are never persisted by the gateway itself. The sixth and final control in the IntentGate authorization pipeline.

Why bidirectional filtering matters

When an agent processes user-supplied content and forwards it to an LLM provider, any PII in the content travels with it. The provider's logs retain it, training pipelines may sample it, and breach disclosures become required if the content includes regulated personal data. The OWASP LLM02 (Sensitive Information Disclosure) risk class captures this surface, and the EU's GDPR / NIS2 frameworks impose specific obligations on data leaving the controller's perimeter to a processor.

Outbound filtering is the obvious half of the defence: detect and strip PII before the prompt leaves the gateway. But inbound filtering is equally important and frequently overlooked. A compromised LLM provider, a cross-tenant cache leak, or simply a context-mixing bug can cause a response to contain PII from another tenant's context. Without inbound filtering, that PII reaches your agent, gets logged by your application, and ends up in your downstream systems. You're now responsible for someone else's data. Inbound filtering catches and strips it at the gateway before any of that happens.

How IntentGate implements PII filtering

The filter runs as a streaming content inspector on the gateway's request and response paths. For outbound (agent → LLM): the prompt is scanned, matches are stripped or redacted according to the class-specific policy (strip-and-replace with class tag, or refuse the call entirely for the most sensitive classes), and the modified prompt is forwarded. For inbound (LLM → agent): the response is scanned the same way before being returned to the agent.

Detection is multi-method per class. For structured patterns (IBAN, BSN, payment cards, dates) the filter uses regex with Luhn / mod-97 / format validators to suppress false positives. For unstructured patterns (named-entity person names, street addresses) the filter uses a lightweight classifier model that runs inline. The class catalogue is configurable per deployment; the eighteen built-in classes cover the most common regulated PII in the Netherlands, EU, US, and UK.

Counts-only audit

Every detection is logged: the class detected, the count of matches in this call, the direction (outbound or inbound), the tool that triggered the call. What is not logged: the matched values themselves. The IntentGate audit chain records "three email addresses were stripped from this tool call's outbound prompt" — never the addresses.

This is the counts-only audit pattern, and it is structurally important. Without it, the audit log itself becomes a repository of PII — a high-value target for breach, a compliance reporting obligation, and a regulatory liability. With counts-only audit, the protection layer cannot itself become the failure mode. Operators get the visibility they need (which classes are appearing, at what volumes, on which tools) without inheriting custody of the values.

What PII filtering does and does not do

The filter detects classes it has been configured to detect. Adversarial encodings (homoglyph substitution, base64 wrapping, language translation) defeat naive regex but are caught by the classifier-based detection for the most common unstructured classes. Truly novel encodings or PII classes outside the catalogue may pass through; for high-sensitivity deployments, operators should pair IntentGate's filter with specialized DLP at adjacent points in the architecture (Microsoft Purview DLP for files at rest, Zscaler DLP for web egress).

The filter is also not a content-safety filter. Prompt-injection payloads, jailbreak attempts, hate speech, or harmful content are outside its scope — those belong to the prompt firewall category (Lakera, Prompt Security, Robust Intelligence) which complements but does not replace agent runtime authorization. IntentGate handles authorization of agent actions; prompt firewalls handle content safety of LLM prompts. Both have a place in a complete deployment.

Error code and observability

PII filtering decisions return JSON-RPC error code −32015 when a call is refused outright (the strict policy for the most sensitive classes). When the policy is strip-and-continue, the call proceeds with the modified content and the audit log records the detection but no error is returned. SIEM adapters route on the error code; operators build dashboards of detection volume by class and by tool to spot patterns (sudden spike in IBAN detections on a tool that shouldn't see them, for example).

Related controls

Bidirectional PII filtering runs alongside the other five controls — it is not strictly sequential but rather operates on the content of calls the other checks have already authorized. See the Agent Runtime Authorization category page for the full pipeline and the Standards Alignment page for how PII filtering maps to GDPR, EU AI Act, and ISO 42001 requirements.

Frequently asked questions

Why bidirectional and not just outbound?

Outbound filtering protects the LLM provider boundary — PII does not end up in the provider's logs or training data. Inbound filtering defeats a different attack: a poisoned upstream that returns PII from another tenant's context into the response stream. Without inbound filtering, a compromised LLM provider or a cross-tenant cache leak would expose other customers' PII to your agent. Both directions need filtering for the protection to be complete.

What does "counts-only audit" mean?

When the filter detects and strips PII, the audit log records that detection happened — the class of PII, the count of matches, the direction — but never the matched values themselves. Without counts-only audit, the audit log would become a secondary repository of PII, itself a breach and compliance target. Counts-only ensures the protection itself does not create a new risk.

What PII classes does IntentGate detect by default?

Eighteen built-in classes covering the most common regulated PII: email address, phone number (international and country-specific patterns), IBAN, SWIFT/BIC, Dutch BSN (sofi-nummer), German Steuer-ID, US SSN, payment card numbers (Luhn-validated), passport numbers, date of birth in multiple formats, IP addresses, named-entity person names (classifier-based), street addresses, and several others. Custom classes can be added per deployment for organization-specific regulated data.