
TL;DR
Anthropic's Claude containment writeup points to the next security layer for coding agents: deterministic capability ledgers, not another approval prompt.
Read next
Manual approval prompts stop protecting users when coding agents ask too often. The better pattern is risk-aware autonomy: safe defaults, narrow deny rules, and approvals only for meaningful changes.
7 min readGitHub Trending is full of agent memory and context tools. The useful version is not magic recall. It is a context ledger: source-linked, scoped, expiring memory that agents can inspect and users can audit.
8 min readCodeGraph is trending because AI coding teams are running into the same bottleneck: agents waste too many tokens rediscovering the repo. Local indexes help, but only if you treat them as navigation aids instead of source truth.
9 min readAnthropic published a useful engineering post today on how it contains Claude across products, and the Hacker News thread immediately turned into the right argument: sandboxing helps, but it does not magically solve prompt injection, egress, credential scope, or the weird trust boundary between "the agent saw a thing" and "the agent can now act on it."
That is the real story for developers building with Claude Code, Codex, MCP tools, background agents, and automated review loops.
The old security model was:
Ask the model to behave. Ask the user for approval. Log what happened.
The new model needs to be:
Give the agent a deterministic capability ledger. Every file, token, network path, tool, identity, and escalation has to be scoped, recorded, revocable, and reviewable.
The post is worth reading because it moves the conversation away from "Claude is safer because it says no" and toward something closer to operating-system design. A model instruction is a preference. A sandbox boundary is a fact. A scoped credential is a fact. A network egress rule is a fact. A short-lived per-session token is a fact.
AI agent security gets better when more of the safety story becomes factual.
Anthropic's framing is simple: contain the environment first, then steer the model. That sounds obvious until you look at how most developers actually run agents.
They install a terminal agent in a real repo. It runs as their user. It can read the files the user can read. It can often see local environment variables. It can run package installers. It can call GitHub, Slack, Linear, Gmail, or a database through an MCP server. It can ingest untrusted issue text, docs, webpages, test output, dependency readmes, and CI logs. Then the user approves commands one by one until approval fatigue kicks in.
That is not containment. That is vibes with a confirmation dialog.
This is the same operational theme behind prompt injection in open source, agent memory as a context ledger, and long-running agent harnesses. The agent is not dangerous because it can write code. It is dangerous because code execution, private context, and external communication can land in the same session without a durable policy object in the middle.
Simon Willison has called that combination the lethal trifecta for AI agents: private data, untrusted content, and external communication. Anthropic's post is basically a product-engineering answer to that trifecta.
The HN pushback sharpened the point. Commenters raised domain fronting, steganography in commits, timing side channels, malicious artifacts that cross from a low-privilege VM into a high-privilege local workflow, and the fact that Docker is not always the boundary people think it is. That does not make the containment work useless. It means "sandboxed" is not a binary label.
Containment has dimensions.
A capability ledger is the missing product primitive for agent runtimes.
It is not just a permission screen. It is a structured record of what the agent is allowed to touch and why:
That ledger should live alongside the run, not buried in a settings UI. When an agent opens a PR, ships a migration, comments on an issue, or drafts a release, the review should include the ledger.
What did it read? What did it write? Which external systems did it touch? Which private values were present? Which untrusted sources were mixed into the same context? Which approvals were granted? Which approvals were denied? Which policy widened during the run?
This is why agent receipts matter. A diff tells you what changed. A capability receipt tells you what the agent could have done.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 3, 2026 • 7 min read
May 31, 2026 • 8 min read
May 30, 2026 • 8 min read
May 30, 2026 • 9 min read
Most local agent tools still lean heavily on interactive approval prompts. That makes sense for early power users. It is also not a long-term security model.
Approval prompts fail in predictable ways:
If the agent asks to run npm install, what is the user actually approving? Package downloads? Lifecycle scripts? Network calls? Native compilation? Access to the current directory? Reading .npmrc? A future test command that imports the new package?
The right answer is not "never run package installs." The right answer is that package installation should be a named capability with a scoped environment, no unnecessary secrets, a dependency diff, and a clear path back to review. This is the same reason the OpenAI Codex cloud security playbook is more useful than a generic "be careful with agents" warning: the product boundary matters.
The most interesting part of the HN discussion was not whether Anthropic's exact implementation is perfect. It was the repeated point that exfiltration is the hard part.
If an agent can see private data and can also communicate externally, then prompt injection becomes more than a content-quality problem. It becomes a data-flow problem.
You can block obvious bad domains. You can proxy network calls. You can strip secrets from logs. You can require approval before posting to Slack or opening a browser. Those controls help, but the counterarguments are real:
This is where "allowlist this domain" becomes too vague. A domain allowlist is not just a connectivity rule. It is an output capability. If the agent can shape a request to a domain, the agent has some ability to transmit information through that channel.
That does not mean agents are unusable. It means egress should be explicit and boring. A coding agent with repo write access does not automatically need access to your email. A research agent with browser access does not automatically need your filesystem. A local file analysis agent does not automatically need internet access. A deployment agent does not automatically need package-publishing credentials.
Separate the roles. Separate the identities. Separate the network.
The Model Context Protocol made agent tools easier to connect. That is good. It also made it easier to accidentally turn a chat session into a dense graph of real capabilities.
An MCP server can expose a database query, a CRM action, a GitHub mutation, a local filesystem tool, a Slack sender, a browser, or a custom internal workflow. Each one sounds small in isolation. Together, they become the agent's operating surface.
That surface needs a ledger.
For MCP, the ledger should answer:
This is especially important because tool descriptions are part of the model context. A compromised or sloppy tool can lie about what it does. A server can advertise a harmless description and still return content that changes the next step. That is why MCP debugging needs traces, as covered in MCP debugging with MCP Lens, but security needs a policy layer above traces.
Tracing shows what happened. A capability ledger shows what was possible.
If you are adopting coding agents inside a real team, you do not need to wait for every vendor to standardize this. You can start with a practical containment baseline.
First, split agent profiles by job.
Create separate profiles for research, local code editing, dependency work, production debugging, and deployment. Give each profile the minimum useful capabilities. The research profile can browse and summarize but cannot see secrets. The local editing profile can read and write the repo but cannot push or access broad cloud credentials. The deployment profile can operate only from CI with protected environment rules.
Second, move credentials out of the default environment.
Do not let every agent inherit the same shell session your human user has. Use short-lived tokens. Use repository-scoped tokens. Use service accounts. Use protected CI environments. Make the credential radius visible in review.
Third, treat network access as a write permission.
Outbound network is not just "internet." It is a channel. For some tasks, no network is the correct default. For other tasks, read-only docs access is enough. For still others, a small allowlist with request logging is the right compromise.
Fourth, gate artifact movement.
The dangerous moment is often not inside the sandbox. It is when the artifact leaves it: a patch, a generated config, a dependency lockfile, a browser-exported file, a migration, a release note, or a copied prompt. Make that movement reviewable.
Fifth, store receipts with the work.
For every agent run that touches production code, store a small receipt: policy profile, file scope, network scope, tools used, credentials available, tests run, and human review point. This can be a markdown artifact at first. It does not need to be fancy. It does need to survive the chat.
The next competition between AI coding tools will not just be model quality. It will be runtime trust.
Claude Code, Codex, Cursor, Copilot, Devin-style cloud agents, MCP-heavy workflows, and internal agent platforms are all moving toward the same place: agents that can work for longer, touch more systems, and need less babysitting. That only works if the surrounding runtime gets more deterministic as the model gets more capable.
Prompting the model to be careful is table stakes. Asking the user to approve every shell command is a temporary bridge. The durable layer is a capability ledger that makes every run inspectable:
That is the post-Anthropic-containment lesson for developers: stop treating agent security as a personality trait. Treat it as runtime accounting.
AI agent containment is the practice of limiting what an agent can read, write, execute, and communicate with while it works. Strong containment uses environment boundaries, scoped credentials, network controls, and review gates instead of relying only on model instructions.
Approval prompts show individual actions, but they rarely show the full capability graph. A user may approve a command without seeing which secrets, files, tools, or outbound channels are also available in the same session.
A capability ledger is a durable record of an agent run's permissions: filesystem access, network access, credentials, tool calls, identity, memory, and review boundaries. It helps reviewers understand not just what changed, but what the agent was capable of doing.
No. Sandboxing reduces blast radius, but prompt injection can still matter when untrusted content, private data, and external communication share a workflow. Sandboxes need egress controls, scoped credentials, artifact review, and clear identity boundaries.
Start by separating agent profiles by job, removing broad credentials from default shells, treating network access as a write permission, gating artifact movement out of sandboxes, and storing receipts for every meaningful agent run.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolGives AI agents access to 250+ external tools (GitHub, Slack, Gmail, databases) with managed OAuth. Handles the auth and...
View ToolLightweight Python framework for multi-agent systems. Agent handoffs, tool use, guardrails, tracing. Successor to the ex...
View ToolAlibaba's flagship open-weight coding model. 480B total parameters, 35B active (MoE). Native 256K context, scales to 1M....
View ToolSpec out AI agents, run them overnight, wake up to a verified GitHub repo.
View AppEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppTurn a one-liner into a working Claude Code skill. From idea to installed in a minute.
View AppConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsDeep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.
AI AgentsA practical walk-through of how to design, write, and ship a Claude Code skill - from choosing when to trigger, through allowed-tools, to the steps the agent will actually follow.
Getting Started
GitHub Trending is full of agent memory and context tools. The useful version is not magic recall. It is a context ledge...

The ChatGPT for Google Sheets exfiltration report is not just a spreadsheet bug. It is a warning about agentic office to...

A huge Hacker News thread says domain expertise is the real moat in agentic coding. The sharper version: tacit judgment...

Before an AI agent gets tools, files, APIs, MCP servers, or deployment access, decide what it can read, write, call, log...

AI coding agents become safer when permissions, logs, and rollback are designed as one system. Here is the operating loo...

Prompt injection stops being an abstract LLM risk once an agent can call tools. The practical defense is data boundaries...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.