OpenAI Privacy Filter: Production PII Redaction Guide

OpenAI dropped Privacy Filter as an open-weight PII redactor a few weeks back. I wired it into a real RAG ingestion pipeline the same evening and benchmarked it against Microsoft Presidio plus a regex baseline I have been running in production for two years. The short version is that Privacy Filter caught roughly 12 percent more PII than Presidio with comparable latency once I tuned the runtime, and it caught nearly 40 percent more than the regex baseline. The longer version, including where it failed, is below.

Why an open-weight PII model is a big deal

The privacy story for LLM pipelines has been broken for a long time. The two production options have been hosted PII APIs, which means shipping your raw documents to a third party, or rules-based tools like Presidio, which work but miss anything contextual. Both options have real downsides. The hosted APIs add egress and break the audit story. The rules-based tools miss entity types that humans easily recognize, like a street address split across three lines, or a name embedded in a meeting transcript.

For the design side of the same problem, read OpenAI Codex: Cloud AI Coding With GPT-5.3 with OpenAI vs Anthropic in 2026 - Models, Tools, and Developer Experience; they show how agent-generated interfaces fail and how to give coding agents better visual constraints.

An open-weight model that runs locally splits the difference. You get model-class recall without the hosted-API exposure. You can run it in the same VPC as your vector store, log every redaction decision for audit, and deterministically version the model the same way you version your other dependencies. For regulated industries that means GDPR-compliant ingestion stops being a flag-waving exercise and becomes a tractable engineering problem.

The catch is throughput. A model that runs locally only matters if it runs locally fast enough to fit in your ingestion budget. That is what I went to find out.

Setup: weights, hardware, runtime

Privacy Filter ships on Hugging Face. The base build is small enough to run on a single consumer GPU, which is the relevant constraint for most teams. I ran it on an L40S in our staging environment for the benchmarks, then moved the production deployment to a CPU-only instance to test the worst case.

Loading the model is straightforward.

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("openai/privacy-filter")
model = AutoModelForTokenClassification.from_pretrained(
    "openai/privacy-filter",
    torch_dtype=torch.float16,
).to("cuda")
model.eval()

For production, do not call the model directly. Wrap it in a redactor class that batches inputs, applies a confidence threshold, and emits a structured redaction record for audit. Every redaction event needs to be logged with the original span, the predicted entity type, the confidence, and the replacement token. That log is the audit trail your compliance team will ask for the first time someone files a data-subject request.

from dataclasses import dataclass
from typing import List

@dataclass
class RedactionEvent:
    original: str
    entity_type: str
    confidence: float
    replacement: str
    offset: int

class PrivacyFilter:
    def __init__(self, model, tokenizer, threshold: float = 0.85):
        self.model = model
        self.tokenizer = tokenizer
        self.threshold = threshold

    def redact(self, text: str) -> tuple[str, List[RedactionEvent]]:
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True).to("cuda")
        with torch.no_grad():
            logits = self.model(**inputs).logits
        # decode spans, apply threshold, build events, return redacted text
        return self._apply(text, logits, inputs)

The full implementation is a few hundred lines once you handle batching, sliding windows for long documents, and the entity-type taxonomy. I push the redaction events into DD Traces so we can see redaction stages alongside the rest of our agent telemetry.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Assistants to Responses API: A Migration Field Guide

Apr 29, 2026 • 13 min read

Prompt Caching in the Claude API: A Production Guide

Apr 29, 2026 • 11 min read

SAM 3.1: Realtime Video Segmentation in Apps

Apr 29, 2026 • 10 min read

Self-Hosting AI Agents: 5 Ways to Run Claude Code on Your Own Infra

Apr 29, 2026 • 13 min read

Wiring it into a RAG ingestion pipeline

The pattern that makes this work in a real pipeline is pre-embed redaction. Redact before chunking, before embedding, before anything that would fan the raw text out to other systems. If a piece of PII makes it into your vector store, you will spend the next month trying to delete it cleanly. If it never makes it past ingestion, you have one place to audit and one place to fix.

Here is the ingestion shape I use.

async def ingest_document(doc_id: str, raw_text: str) -> None:
    redacted, events = privacy_filter.redact(raw_text)
    await audit_log.write(doc_id=doc_id, events=events)

    chunks = chunker.split(redacted)
    embeddings = await embedder.embed_batch([c.text for c in chunks])

    await vector_store.upsert([
        {
            "id": f"{doc_id}::{i}",
            "vector": emb,
            "metadata": {"doc_id": doc_id, "redaction_count": len(events)},
            "text": chunk.text,
        }
        for i, (chunk, emb) in enumerate(zip(chunks, embeddings))
    ])

Two details matter here. First, the audit log writes before the embeddings, so if the embedding step fails you still have a record of what was redacted. Second, the redaction count rides on the chunk metadata, which makes downstream debugging dramatically easier. When a retrieval surfaces a chunk and a user complains it looks weird, you can tell at a glance whether the weirdness is from redaction or from something upstream.

For document storage, I keep the raw and redacted versions in agentfs with the audit-trailed access controls turned on. The raw version stays in a quarantine bucket that only the redactor can read. The redacted version is what flows into the rest of the pipeline. If a regulator asks what was deleted and when, the answer is in one place.

Benchmark vs. Presidio + regex baseline

I ran all three on a 5,000-document synthetic corpus that I built from a mix of public datasets plus generated examples for the entity types I care about most. Names, addresses, phone numbers, emails, government IDs, financial accounts, and dates of birth.

Recall on names: regex 31 percent, Presidio 76 percent, Privacy Filter 88 percent. The Privacy Filter advantage concentrates on names that appear without title or honorific, which is the case where pattern-matching tools have to fall back to dictionaries. The model gets context.

Recall on addresses: regex 42 percent, Presidio 71 percent, Privacy Filter 84 percent. The biggest gap is on multi-line addresses where the line breaks confuse rules-based tools. The model handles those fine.

Recall on government IDs: regex 91 percent, Presidio 93 percent, Privacy Filter 89 percent. This is the one place the regex baseline still wins. Government IDs have well-defined formats, and pattern matching is just better at high-precision extraction of fixed formats. I now run the Privacy Filter and a regex pass in series and union the results for ID-type entities.

Latency on the L40S, batched at 32 documents: regex 8ms per doc, Presidio 22ms, Privacy Filter 41ms. On CPU only, batched at 8: regex 11ms, Presidio 38ms, Privacy Filter 280ms. CPU-only is workable for low-volume ingestion but not for anything real-time.

Precision is high across the board. False-positive redactions ran at roughly 2 percent for Privacy Filter, 4 percent for Presidio, and 0.5 percent for regex. The high false-positive rate on Presidio is mostly common nouns being flagged as proper names, which is the long-standing weakness of dictionary-driven systems.

Failure modes

Three failure modes worth flagging.

First, context-aware misses. Privacy Filter occasionally misses PII that is technically present but heavily abbreviated or obfuscated. A name like "J. M." with no surrounding context gets through about 30 percent of the time. The fix is a cheap regex pass for initials patterns layered on top of the model output.

Second, multilingual edges. The model was trained primarily on English data and the recall drops noticeably on Spanish and Mandarin documents in my corpus. If you have multilingual content, run separate evals per language before relying on the redactor for compliance. I caught this only because we have a chunk of Spanish-language support tickets in our corpus, and an early version of the pipeline let several names through that human reviewers flagged.

Third, structured PII. The model handles natural language well and structured data badly. CSV files, JSON dumps, log lines with semi-structured fields. For those, I parse the structure first, redact each field that looks free-form, and pass the structured fields through a regex layer. Treating a CSV row as a single string and shoving it through the model gives unreliable results.

Production checklist

Before you flip the switch, make sure you have all of these in place.

Logging. Every redaction event with original span, entity type, confidence, replacement, and document ID. This is non-negotiable for audit.

Versioning. The model checksum lives in your deploy artifact. When the model updates, the checksum changes, and your re-ingest pipeline knows to redo old documents.

Confidence threshold. Tunable per entity type, not global. Government IDs at 0.95, names at 0.80, addresses at 0.75 in my deployment. Tune against your own corpus.

Regression eval. A golden set of 200 real-or-realistic documents with hand-labeled redactions. CI runs the redactor against this set on every model bump and fails the build if recall drops more than 1 percent on any entity type.

Downstream verification. Periodically sample chunks out of the vector store and human-review them for missed PII. The model will miss things. The question is whether you find out from a human reviewer or from a regulator.

Quarantine. Raw documents go to a separate, access-restricted bucket. Only the redactor service has read access. The rest of the pipeline reads only redacted output.

I shipped the full pipeline walkthrough on the DevDigest YouTube channel the week after Privacy Filter dropped. The benchmark notebook is in the same repo as my eval harness. If you are running RAG against any document corpus that touches user data, this is the cheapest compliance upgrade I have shipped in the last year.

GPT-5.4 for Developers: The Production Guide

GPT-5.5 for Developers: A Production Field Guide

RAG with Claude: Add Context Without Retraining

Why an open-weight PII model is a big deal

Setup: weights, hardware, runtime

Assistants to Responses API: A Migration Field Guide

Prompt Caching in the Claude API: A Production Guide

SAM 3.1: Realtime Video Segmentation in Apps

Self-Hosting AI Agents: 5 Ways to Run Claude Code on Your Own Infra

Wiring it into a RAG ingestion pipeline

Benchmark vs. Presidio + regex baseline

Failure modes

Production checklist

Comments

Related Tools

Agency Swarm

Ollama

Mastra

OpenAI Codex

Apps from Developers Digest

Migrate

DD Starter

Related Guides

Chronicle Research Preview Setup Guide

Claude Code Setup Guide

Building Your First MCP Server

Related Posts

OpenAI Agents SDK Evolution: What Ships in Production

GPT-5.4 for Developers: The Production Guide

GPT-5.5-Codex in Production: What Actually Changes

GPT-5.5 for Developers: A Production Field Guide

RAG with Claude: Add Context Without Retraining

Agentic Search Works Best When It Writes Queries, Not Answers

Get Smarter About AI Dev

GPT-5.4 for Developers: The Production Guide

GPT-5.5 for Developers: A Production Field Guide

RAG with Claude: Add Context Without Retraining

Why an open-weight PII model is a big deal

Setup: weights, hardware, runtime

Assistants to Responses API: A Migration Field Guide

Prompt Caching in the Claude API: A Production Guide

SAM 3.1: Realtime Video Segmentation in Apps

Self-Hosting AI Agents: 5 Ways to Run Claude Code on Your Own Infra

Wiring it into a RAG ingestion pipeline

Benchmark vs. Presidio + regex baseline

Failure modes

Production checklist

Comments

Related Tools

Agency Swarm

Ollama

Mastra

OpenAI Codex

Apps from Developers Digest

Migrate

DD Starter

Related Guides

Chronicle Research Preview Setup Guide

Claude Code Setup Guide

Building Your First MCP Server

Related Posts

OpenAI Agents SDK Evolution: What Ships in Production

GPT-5.4 for Developers: The Production Guide

GPT-5.5-Codex in Production: What Actually Changes

GPT-5.5 for Developers: A Production Field Guide

RAG with Claude: Add Context Without Retraining

Agentic Search Works Best When It Writes Queries, Not Answers

Get Smarter About AI Dev