Security Agents Need Repro Harnesses, Not More Scan Prompts

Official Sources#

Source	Description
Anthropic - Using LLMs to secure source code	Anthropic's May 27, 2026 guide to threat modeling, sandboxing, discovery, verification, triage, and patching with Claude
Anthropic defending-code-reference-harness	Open-source reference implementation with Claude Code skills and an autonomous vulnerability-discovery pipeline
Claude vulnerability detection agent cookbook	Claude Agent SDK walkthrough for a lighter recon, scan, triage, report, and patch loop
Harness security notes	Project documentation for sandbox assumptions and safety boundaries
HN discussion	Hacker News discussion that pushed on false positives, reproducibility, and operational risk

Anthropic's defending-code-reference-harness hit the Hacker News front page today, and the interesting part is not that Claude can look for bugs. We already crossed that line.

The interesting part is the shape of the workflow.

The repo turns AI security work into a loop: build a threat model, run discovery agents, verify findings in a fresh environment, dedupe them, write exploitability reports, generate patches, and then test whether the original proof of concept still fails. Anthropic's accompanying post says the bottleneck has moved: discovery is now straightforward to parallelize, while verification, triage, and patching are where teams get stuck.

That is the useful developer takeaway.

If your AI security process is still "ask a model to review the repo for vulnerabilities," you are building a better checklist. The next step is a reproducible harness.

The Scan Prompt Is the Wrong Unit#

Most AI security demos start with a prompt:

Review this codebase for security issues.

That can work for a first pass. It can also produce confident noise. The model lacks the system's threat model, deployment assumptions, dependency boundaries, reachable entry points, and historical bug shapes. It may flag a scary-looking path that is not attacker controlled. It may miss a boring path that is internet-facing in production.

Anthropic's guide makes a sharper point: the false positive is often not a model reasoning failure. It is a threat-model failure.

That matches what developers see in normal code review too. A reviewer who does not know which inputs are trusted, which services are internal, which queues are adversarial, and which legacy components are intentionally isolated will give shallow advice. A model does the same thing at higher speed.

The better unit is not a prompt. It is a repo-local security harness:

THREAT_MODEL.md that names assets, entry points, trust boundaries, and out-of-scope cases.
A repeatable build of the target that matches the code actually deployed.
A sandbox where the agent can run proof-of-concept inputs without touching real credentials or production systems.
A structured finding format that separates suspicion from proof.
A verification step that starts from a clean environment.
A patch receipt that proves the fix changes the exploit path without breaking the test suite.

That is why this belongs next to agent containment, AI security triage, and permissions, logs, and rollback for coding agents. The core question is not "can the model find something?" It is "can the system prove what happened?"

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

MAI-Code-1-Flash Is a Model Routing Signal

Jun 3, 2026 • 7 min read

AI Agent Memory Needs a Context Ledger

Jun 2, 2026 • 8 min read

Domain Expertise Is the New Agentic Coding Moat

May 31, 2026 • 8 min read

The Agent Security Checklist I Use Before Connecting Tools

May 30, 2026 • 8 min read

Discovery Should Be High Recall#

One subtle design choice in Anthropic's loop is that discovery and verification are separate jobs.

That matters.

If you ask one agent to both find and dismiss issues, it can self-censor. It may drop weird leads too early because they look unlikely. It may overfit to the obvious vulnerability classes in your prompt. It may spend too much of the context budget justifying why something is safe instead of exploring attack paths.

Discovery should optimize for recall. Let it fan out. Let it partition the codebase by attack surface. Let it produce candidate findings with proof attempts, confidence, and missing evidence. Let it be creative within a sandbox.

Verification should optimize for precision. It should take the candidate, rebuild a fresh environment, reproduce the proof, confirm reachability, check whether a compensating control exists, and label the finding accordingly.

This is the same engineering pattern behind good agent swarms. The fastest agent is not always the one that merges code. The useful system has specialized roles:

scout agents that explore broad surface area;
verifier agents that reproduce results from scratch;
judge agents that dedupe and rank;
patch agents that make minimal changes;
regression agents that search for bypasses after the fix.

Security work makes that separation non-optional. A false positive can waste maintainer time. A false negative can leave a real bug alive. A sloppy patch can make the system worse.

The Harness Is a Product Boundary#

The repo's autonomous pipeline runs target code inside gVisor-isolated containers and restricts egress to the model API. The README is explicit that the autonomous reference pipeline refuses to run outside that sandbox unless overridden.

That is not just a safety footnote. It is the product boundary.

A vulnerability-discovery agent is supposed to do adversarial work. It may craft malformed inputs, run binaries, trigger crashes, write exploit scripts, inspect logs, and generate patches. If you run that in the same shell that has your cloud credentials, SSH keys, package registry token, and browser session, you have built a security tool with an insecure runtime.

This is where the conversation connects to the lethal trifecta problem: private data, untrusted content, and external communication should not casually share the same agent session.

For security agents, the safer default is boring:

no production credentials mounted;
no broad network access during scans;
dependencies pinned to the deployed version;
target snapshots reset between runs;
model API access routed through a proxy;
artifacts reviewed before they leave the sandbox;
every patch tied to the proof it claims to fix.

The HN skepticism around the harness is healthy because this is exactly where tools tend to oversell. Sandboxes are not magic. Containers can be misconfigured. Build environments can drift from production. Agents can find bugs in the harness instead of the target. A proof of concept can be real and still low severity in the actual deployment.

That does not weaken the harness argument. It strengthens it. If the environment matters this much, then the environment has to be part of the security artifact.

Threat Models Become Agent Context#

The strongest part of Anthropic's writeup is the insistence on threat modeling before scanning.

Security teams already know this. AI tooling makes it easier to skip, because the model can produce a long list of plausible issues without asking enough domain questions. That feels productive until the triage meeting starts.

The better pattern is to treat the threat model as executable agent context.

Not executable as in "run this file." Executable as in: the harness actually consumes it. Discovery agents read it before they search. Triage agents use it to calibrate severity. Patch agents use it to avoid fixing non-issues while missing the real trust boundary.

A good agent-readable threat model should answer:

What are we building?
Which assets matter?
Which inputs are attacker controlled?
Which users are authenticated but untrusted?
Which internal services are trusted, and why?
Which historical bug classes have hurt us before?
Which findings are out of scope for this repo?
Which mitigations exist outside the codebase?

This is not bureaucracy. It is context engineering for security work.

Teams already write README.md, AGENTS.md, CLAUDE.md, design docs, architecture diagrams, runbooks, and test fixtures so coding agents can operate with less guessing. THREAT_MODEL.md belongs in that family.

The Patch Receipt Matters More Than the Patch#

The patch step is where AI security demos often get too optimistic.

Generating a fix is not enough. A security patch has to prove four things:

The original proof of concept fails after the change.
The normal test suite still passes.
The patch does not widen another trust boundary.
A fresh search cannot find an easy variant.

That proof should travel with the pull request.

Call it a patch receipt:

Text

Finding: heap overflow in parser X
Threat model path: untrusted file import
Proof: crash input repros 3/3 before patch
Verification: fresh container reproduced crash
Patch: bounds check before allocation
Regression: crash input no longer crashes
Variant search: fresh agent found no adjacent parser bypass in one run
Human review: owner approved severity and scope

The exact fields will vary. The habit should not.

This is the same receipt culture needed for parallel coding agents and long-running agent harnesses. When machines can generate more work than humans can inspect line by line, the review packet becomes part of the work product.

What Developers Should Do This Week#

You do not need Anthropic's full reference harness to improve your workflow.

Start smaller:

Add a THREAT_MODEL.md to one service.
Pick one vulnerability class that actually matters for that service.
Create a local repro target with pinned dependencies and no secrets.
Ask an agent to find candidate issues against that narrow target.
Require proof before escalation.
Track false positives, duplicate findings, and time to verified patch.
Save the finding receipt with the PR.

Then widen the loop.

Add more target components. Add a separate verification pass. Add a dedupe step. Add a regression search after patches. Add periodic scanning when high-risk code changes.

The mistake is trying to jump straight from manual security review to autonomous security operation. The useful path is boring and incremental: one harness, one bug class, one proof format, one owner loop.

The Take#

AI security is entering its CI era.

The winning teams will not be the ones with the longest scan prompt. They will be the ones with the best repro harness: clear threat models, faithful sandboxes, separated discovery and verification, patch receipts, and enough operational discipline to turn findings into shipped fixes.

The model finds candidates. The harness proves them. The team owns the patch.

That is the loop.

FAQ#

What is Anthropic's defending-code-reference-harness?#

It is an open-source reference implementation for AI-assisted vulnerability discovery and remediation with Claude. It includes Claude Code skills for threat modeling, scanning, triage, patching, and customization, plus an autonomous pipeline that runs recon, find, verify, report, and patch stages.

Why is a repro harness better than a security scan prompt?#

A prompt can produce useful leads, but a harness gives the agent a repeatable target, a threat model, a sandbox, a verification path, and a patch receipt. That makes findings easier to reproduce, dedupe, prioritize, and fix.

Should every team run autonomous security agents?#

No. Start with narrow, supervised workflows. Use one service, one vulnerability class, one sandbox, and one receipt format before scaling. Autonomous scanning without verification and ownership can create a larger triage queue instead of reducing risk.

What should be in a security agent patch receipt?#

Include the finding, threat-model path, reproduction steps, verification environment, patch summary, tests run, variant search, and human review point. The receipt should make it clear what was proven and what still depends on judgment.

Official Sources#

Source	Description
Anthropic - Using LLMs to secure source code	Anthropic's May 27, 2026 guide to threat modeling, sandboxing, discovery, verification, triage, and patching with Claude
Anthropic defending-code-reference-harness	Open-source reference implementation with Claude Code skills and an autonomous vulnerability-discovery pipeline
Claude vulnerability detection agent cookbook	Claude Agent SDK walkthrough for a lighter recon, scan, triage, report, and patch loop
Harness security notes	Project documentation for sandbox assumptions and safety boundaries
HN discussion	Hacker News discussion that pushed on false positives, reproducibility, and operational risk

Anthropic's defending-code-reference-harness hit the Hacker News front page today, and the interesting part is not that Claude can look for bugs. We already crossed that line.

The interesting part is the shape of the workflow.

That is the useful developer takeaway.

If your AI security process is still "ask a model to review the repo for vulnerabilities," you are building a better checklist. The next step is a reproducible harness.

The Scan Prompt Is the Wrong Unit#

Most AI security demos start with a prompt:

Review this codebase for security issues.

Anthropic's guide makes a sharper point: the false positive is often not a model reasoning failure. It is a threat-model failure.

The better unit is not a prompt. It is a repo-local security harness:

THREAT_MODEL.md that names assets, entry points, trust boundaries, and out-of-scope cases.
A repeatable build of the target that matches the code actually deployed.
A sandbox where the agent can run proof-of-concept inputs without touching real credentials or production systems.
A structured finding format that separates suspicion from proof.
A verification step that starts from a clean environment.
A patch receipt that proves the fix changes the exploit path without breaking the test suite.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

MAI-Code-1-Flash Is a Model Routing Signal

Jun 3, 2026 • 7 min read

AI Agent Memory Needs a Context Ledger

Jun 2, 2026 • 8 min read

Domain Expertise Is the New Agentic Coding Moat

May 31, 2026 • 8 min read

The Agent Security Checklist I Use Before Connecting Tools

May 30, 2026 • 8 min read

Discovery Should Be High Recall#

One subtle design choice in Anthropic's loop is that discovery and verification are separate jobs.

That matters.

This is the same engineering pattern behind good agent swarms. The fastest agent is not always the one that merges code. The useful system has specialized roles:

scout agents that explore broad surface area;
verifier agents that reproduce results from scratch;
judge agents that dedupe and rank;
patch agents that make minimal changes;
regression agents that search for bypasses after the fix.

Security work makes that separation non-optional. A false positive can waste maintainer time. A false negative can leave a real bug alive. A sloppy patch can make the system worse.

The Harness Is a Product Boundary#

That is not just a safety footnote. It is the product boundary.

This is where the conversation connects to the lethal trifecta problem: private data, untrusted content, and external communication should not casually share the same agent session.

For security agents, the safer default is boring:

no production credentials mounted;
no broad network access during scans;
dependencies pinned to the deployed version;
target snapshots reset between runs;
model API access routed through a proxy;
artifacts reviewed before they leave the sandbox;
every patch tied to the proof it claims to fix.

That does not weaken the harness argument. It strengthens it. If the environment matters this much, then the environment has to be part of the security artifact.

Threat Models Become Agent Context#

The strongest part of Anthropic's writeup is the insistence on threat modeling before scanning.

The better pattern is to treat the threat model as executable agent context.

A good agent-readable threat model should answer:

What are we building?
Which assets matter?
Which inputs are attacker controlled?
Which users are authenticated but untrusted?
Which internal services are trusted, and why?
Which historical bug classes have hurt us before?
Which findings are out of scope for this repo?
Which mitigations exist outside the codebase?

This is not bureaucracy. It is context engineering for security work.

The Patch Receipt Matters More Than the Patch#

The patch step is where AI security demos often get too optimistic.

Generating a fix is not enough. A security patch has to prove four things:

The original proof of concept fails after the change.
The normal test suite still passes.
The patch does not widen another trust boundary.
A fresh search cannot find an easy variant.

That proof should travel with the pull request.

Call it a patch receipt:

Text

Finding: heap overflow in parser X
Threat model path: untrusted file import
Proof: crash input repros 3/3 before patch
Verification: fresh container reproduced crash
Patch: bounds check before allocation
Regression: crash input no longer crashes
Variant search: fresh agent found no adjacent parser bypass in one run
Human review: owner approved severity and scope

The exact fields will vary. The habit should not.

What Developers Should Do This Week#

You do not need Anthropic's full reference harness to improve your workflow.

Start smaller:

Add a THREAT_MODEL.md to one service.
Pick one vulnerability class that actually matters for that service.
Create a local repro target with pinned dependencies and no secrets.
Ask an agent to find candidate issues against that narrow target.
Require proof before escalation.
Track false positives, duplicate findings, and time to verified patch.
Save the finding receipt with the PR.

Then widen the loop.

Add more target components. Add a separate verification pass. Add a dedupe step. Add a regression search after patches. Add periodic scanning when high-risk code changes.

Official Sources#

The Scan Prompt Is the Wrong Unit#

MAI-Code-1-Flash Is a Model Routing Signal

AI Agent Memory Needs a Context Ledger

Domain Expertise Is the New Agentic Coding Moat

The Agent Security Checklist I Use Before Connecting Tools

Discovery Should Be High Recall#

The Harness Is a Product Boundary#

Threat Models Become Agent Context#

The Patch Receipt Matters More Than the Patch#

What Developers Should Do This Week#

The Take#

FAQ#

What is Anthropic's defending-code-reference-harness?#

Why is a repro harness better than a security scan prompt?#

Should every team run autonomous security agents?#

What should be in a security agent patch receipt?#

Long-Running Agents Need Harnesses, Not Hope

Vera Shows Agent Safety Needs Test Oracles, Not Vibes

AI Security Scanners Move the Bottleneck to Triage

Related Tools

E2B

Cloudflare

AgentCanvas

Claude Code

Apps from Developers Digest

Overnight Agents

Skill Builder

Subagent Studio

Related Guides

Claude Code Complete Course

Claude Code Setup Guide

Keyboard Shortcuts - Claude Code

Related Videos

Agents 101: How to Build and Deploy Anything with AI Agents

Claude Code: NEW Remote Control, Auto Memory, Plugins & More

Claude Code NEW Sub Agents in 7 Minutes

Related Posts

Vera Shows Agent Safety Needs Test Oracles, Not Vibes

AI Agent Containment Needs a Capability Ledger

The Agent Security Checklist I Use Before Connecting Tools

AI Security Scanners Move the Bottleneck to Triage

Approval Fatigue Is an Agent Security Bug

Agent Sandbox Architecture: How to Choose the Right Runtime Boundary

Build with the member tools

Get Smarter About AI Dev

Official Sources#

The Scan Prompt Is the Wrong Unit#

MAI-Code-1-Flash Is a Model Routing Signal

AI Agent Memory Needs a Context Ledger

Domain Expertise Is the New Agentic Coding Moat

The Agent Security Checklist I Use Before Connecting Tools

Discovery Should Be High Recall#

The Harness Is a Product Boundary#

Threat Models Become Agent Context#

The Patch Receipt Matters More Than the Patch#

What Developers Should Do This Week#

The Take#

FAQ#

What is Anthropic's defending-code-reference-harness?#

Why is a repro harness better than a security scan prompt?#

Should every team run autonomous security agents?#

What should be in a security agent patch receipt?#

Long-Running Agents Need Harnesses, Not Hope

Vera Shows Agent Safety Needs Test Oracles, Not Vibes

AI Security Scanners Move the Bottleneck to Triage

Related Tools

E2B

Cloudflare

AgentCanvas

Claude Code

Apps from Developers Digest

Overnight Agents

Skill Builder

Subagent Studio

Related Guides

Claude Code Complete Course

Claude Code Setup Guide

Keyboard Shortcuts - Claude Code

Related Videos