TL;DR
Anthropic added three new primitives to Claude Managed Agents in spring 2026 - dreaming, outcomes, and multi-agent orchestration. Here is how each one works and when to use them together.
Read next
A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, control tradeoffs, and migration guidance for long-running tasks.
18 min readClaude Code is turning into an orchestration layer for agent teams. Here is how subagents, MCP, hooks, and long context fit together in 2026.
9 min readHow to use Claude Code's Task tool, custom sub-agents, and worktrees to run parallel development workflows. Real prompt examples, agent configurations, and workflow patterns from daily use.
11 min readAnthropic launched Claude Managed Agents as a hosted harness for long-running agent work. On April 8, 2026, the initial public beta arrived on the Claude Platform. Then on May 6, 2026, Anthropic shipped three additions that changed the scope of what the platform can do: dreaming (research preview), outcomes (public beta), and multi-agent orchestration (public beta).
This post covers all three together - what they are, how they connect, and when to reach for them over a local agent loop.
Last updated: June 10, 2026
The April 8 launch gave developers a managed harness with secure sandboxing, credential management, durable session state, and persistent event history - replacing the need to build your own agent loop from scratch.
The May 6 update added three distinct primitives on top of that foundation:
| Primitive | Status | What it solves |
|---|---|---|
| Dreaming | Research preview (request access) | Agents repeating mistakes across sessions |
| Outcomes | Public beta | Vague or implicit definitions of "done" |
| Multi-agent orchestration | Public beta | Tasks too large for one context window |
According to Anthropic's announcement, these features were designed to work as a loop, not independently. Dreaming captures lessons across sessions. Outcomes enforce quality within a session. Multi-agent orchestration distributes the work across parallel sessions. Together they address the three most common failure modes in production agent systems: memory drift, undefined success criteria, and context overload.
Dreaming is the most unusual of the three features. It gives agents a formal mechanism to improve between sessions without retraining the model.
The Dreams API reads an existing memory store and up to 100 prior sessions, then writes a new output memory store. That output store reorganizes memories, merges duplicates, replaces stale entries, and surfaces patterns that span multiple sessions. The input memory store is not modified - dreaming produces a separate store that engineers can inspect, discard, or promote.
Ken Huang's analysis describes this accurately: dreaming is closer to a postmortem and runbook-generation process than any kind of "AI sleep" mechanism. It is non-parametric memory consolidation - the model weights do not change. What changes is the structured context attached to future sessions.
The pattern Anthropic recommends is a three-store layout:
Run dreams over the working store and a curated set of recent verified sessions, then promote reviewed outputs into the project store. Never pipe unvetted session transcripts directly into production memory.
Harvey, the legal AI company, reported roughly 6x improvement in completion rates after enabling dreaming - their agents stopped rediscovering filetype workarounds and tool-specific patterns every session.
Dreaming access currently requires a separate request form. It is not available to all API accounts by default.
Outcomes solve what Anthropic calls the "looks done" problem. Without them, an agent stops when its output appears plausible. With outcomes, you write an explicit rubric describing what success looks like. A separate grader agent evaluates the artifact against that rubric in its own context window - it never sees the working agent's reasoning, so it cannot be anchored to partial results.
When the grader finds gaps, it returns them to the working agent, which takes another pass. This continues until the criteria are satisfied, the iteration budget is exhausted, or the outcome is marked failed.
According to Anthropic's internal benchmarks, outcomes improved task success by up to 10 points over a standard prompting loop, with the largest gains on the hardest problems. For structured file generation specifically: +8.4% task success on docx files and +10.1% on pptx files.
The rubric can handle both objective and subjective criteria. Wisedocs, a document verification company, used outcomes to enforce their internal review guidelines and reported reviews running 50% faster while staying aligned with their team's standards.
Outcomes integrate with the new webhooks support: define an outcome, start a session, and receive a webhook notification when the agent satisfies the rubric. This makes it practical to fire-and-forget agent tasks from an API call without polling.
For developers already familiar with evaluator-optimizer patterns from Anthropic's building effective agents guide, outcomes are the managed version of that same loop - but with observable lifecycle events and no custom eval infrastructure to maintain.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jun 10, 2026 • 7 min read
Jun 10, 2026 • 8 min read
Jun 10, 2026 • 9 min read
Jun 10, 2026 • 9 min read
Multi-agent orchestration lets a lead agent delegate work to specialist agents, each with its own model, system prompt, tools, and context window. The lead agent does not share its context with subagents - they operate independently, which is the key architectural difference from a single large-context run.
From Anthropic's announcement:
A lead agent can run an investigation while subagents fan out through deploy history, error logs, metrics, and support tickets.
Specialists work in parallel on a shared filesystem. Events are persistent across all sessions, so the lead agent can check in on subagent progress mid-workflow. Every step is traceable in the Claude Console: which agent did what, in what order, and why.
Netflix's platform team used this to analyze logs from hundreds of builds across different sources. With changes that affect thousands of applications, the multi-agent pattern allows batches to be analyzed in parallel and only the recurring patterns to be surfaced.
Spiral (by Every) combined multi-agent orchestration with outcomes in an interesting way: a lead agent on Claude Haiku handles routing and follow-up questions, then delegates drafting to subagents running on Claude Opus. Each draft is scored against a rubric before being returned to the user.
This is covered in more detail in building multi-agent workflows with Claude Code and the broader agent architecture guide.
The official docs frame the choice clearly: Messages API for custom loops and fine-grained control, Managed Agents for long-running tasks and asynchronous work. Here is a more detailed framework:
| Dimension | Local / Messages API | Claude Managed Agents |
|---|---|---|
| Session length | Short to medium | Long-running, hours+ |
| State persistence | You manage it | Built in, server-side |
| Multi-tenancy | Custom implementation | Handled by platform |
| Sandbox security | You provision | Anthropic-managed or self-hosted |
| Debugging | Full access to your logs | Claude Console traces |
| Dreaming / Outcomes | Not available | Available |
| Zero Data Retention | Available | Not currently eligible |
| HIPAA BAA | Available | Not currently eligible |
| Latency (TTFT) | Fast for simple tasks | Anthropic reports ~60% p50 improvement after decoupling brain from sandbox |
The HIPAA and Zero Data Retention gaps are worth noting for teams in regulated industries - Managed Agents is stateful by design, which currently makes it ineligible for both. You can delete sessions and uploaded files through the API, but the session data persists server-side while the session is live.
Anthropic published a reference architecture on GitHub with Cloudflare Workers. The pattern uses a Workers edge layer to handle webhook callbacks, fan out tasks, and integrate Managed Agent sessions into existing application flows without a dedicated backend service.
The core loop looks like this:
This pattern fits naturally with asynchronous background job queue patterns, where the session lifecycle maps directly to task management.
The SDK handles the beta header automatically. All Managed Agents endpoints require the managed-agents-2026-04-01 beta header on raw API calls.
Both Outcomes and Codex's /goal command define a target state and let an agent iterate toward it. They take different approaches.
| Dimension | Managed Outcomes | Codex /goal |
|---|---|---|
| Evaluation mechanism | Separate grader agent with its own context window | Target-based: did the repo reach the specified state? |
| Rubric format | Free-text acceptance criteria | Typically a shell command or test that passes |
| Iteration control | Iteration budget + lifecycle events | Codex manages its own retry loop |
| Observability | Claude Console traces per iteration | Codex diff and PR output |
| Best for | Quality criteria that require judgment | Deterministic acceptance tests |
For a deeper comparison, see Codex /goal vs Claude Managed Outcomes: Practical Differences.
Quickstart: Follow the official quickstart. Create an agent (define model, system prompt, tools), create an environment, start a session, send events, stream responses.
Beta header: Raw API calls need anthropic-beta: managed-agents-2026-04-01. The Python and TypeScript SDKs set this automatically.
Dreaming access: Not enabled by default. Request access separately.
Common gotchas:
max_iterations value or you will see truncated runs on hard tasks.| Resource | Link |
|---|---|
| Claude Managed Agents overview | platform.claude.com/docs/en/managed-agents/overview |
| Dreaming docs | platform.claude.com/docs/en/managed-agents/dreams |
| Outcomes docs | platform.claude.com/docs/en/managed-agents/define-outcomes |
| Multi-agent orchestration docs | platform.claude.com/docs/en/managed-agents/multi-agent |
| May 6 announcement | claude.com/blog/new-in-claude-managed-agents |
| Engineering: brain-hands architecture | anthropic.com/engineering/managed-agents |
| Cloudflare reference architecture | github.com/cloudflare/claude-managed-agents |
| Access request form | claude.com/form/claude-managed-agents |
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's Python SDK for building production agent systems. Tool use, guardrails, agent handoffs, and orchestration. R...
View ToolAnthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolMulti-agent orchestration framework. Define agents with roles, goals, and tools, then assign them tasks in a crew. Pytho...
View ToolAnthropic's AI. Opus 4.6 for hard problems, Sonnet 4.6 for speed, Haiku 4.5 for cost. 200K context window. Best coding m...
View ToolEvery coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.
View AppTurn a one-liner into a working Claude Code skill. From idea to installed in a minute.
View AppUnlock pro skills and share private collections with your team.
View AppManaged scheduling on Anthropic infrastructure with API and GitHub triggers.
Claude CodeAdmin-controlled allow and deny lists for MCP servers.
Claude CodeThe primary command-line entry point for Claude Code sessions.
Claude Code
Claude Fable 5 Released: Benchmarks, Pricing, Availability, and Real-World Examples Anthropic has released Claude Fable 5, the first general-use “Mythos class” model, and the video reviews the announ...

Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Claude Design by Anthropic: Generate a Design System From Your Repo + Build High-Fidelity UI Fast The video reviews Claude Design by Anthropic, calling it a highly differentiated product, and demonst...

A deep comparison of Codex's new /goal loop and Claude managed agents outcomes, with practical workflow examples, contro...

Claude Code is turning into an orchestration layer for agent teams. Here is how subagents, MCP, hooks, and long context...

How to use Claude Code's Task tool, custom sub-agents, and worktrees to run parallel development workflows. Real prompt...

A practical architecture for multi-step Claude agents. Loop patterns, state management, error recovery, and the producti...
Fable 5 ships with safety classifiers that route flagged requests away from the model. In production you need to handle...
Anthropic gave subscribers two weeks of free Fable 5 access, then it moves to usage credits. Here's what's actually chan...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.