Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Developers Digest•June 10, 2026•8 min read

ai-agents anthropic developer-tools infrastructure claude

The Fable 5 Moment

31 parts

Previous in seriesClaude Managed Agents: Dreaming, Outcomes, and Multi-Agent Orchestration Explained

Next in seriesDario Amodei Wants FAA-Style AI Regulation: Open Questions for Developers

TL;DR

Claude Managed Agents is in public beta with solid sandboxing and session persistence - but the headline orchestration features are still locked behind a research preview waitlist. Here's what teams can actually ship today, what it costs, and when DIY alternatives make more sense.

Direct answer

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Best for

Developers comparing real tool tradeoffs before choosing a stack.

Covers

Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.

Anthropic launched Claude Managed Agents into public beta earlier this year and the coverage has been enthusiastic - possibly too enthusiastic. Read past the launch blog posts and you'll find a consistent pattern: the features that make the product genuinely compelling are still gated behind a separate research preview application. What's actually available today is narrower and more infrastructure-focused than most headlines suggest.

This is not a takedown. The available feature set is genuinely useful for the right workloads. But the gap between what's marketed and what you can actually use is wide enough that it's worth mapping carefully before you build anything on top of it.

Last updated: June 10, 2026

What Managed Agents Delivers Today#

The public beta ships four capabilities that work right now, without any waitlist.

Sandboxed execution environments. Every agent session runs inside an isolated container. Anthropic manages the default cloud sandbox, but teams can also point the environment config at self-hosted infrastructure. The ant CLI and REST API both support creating environments with configurable networking - unrestricted outbound by default, or locked down per your requirements. This is the genuinely useful core of the product: your agent can run bash commands, write files, and hit external APIs without those operations touching your application server.

Long-running sessions with persistence. Sessions stay alive across network disconnections and can be resumed. The event streaming model buffers events server-side so a dropped connection doesn't kill in-flight work. For tasks that take minutes to hours - code generation that iterates, data transformation pipelines, multi-step research - this is a meaningful improvement over stateless API calls wrapped in retry logic.

Tool execution via agent toolsets. The agent_toolset_20260401 tool type unlocks a pre-built set of tools: bash, file read/write, web search, and more. You declare them once when creating the agent definition and they're available to every session. The docs show the complete toolset, and individual tools can be scoped if you want to limit what a particular agent can do.

MCP server support. Agents can connect to MCP (Model Context Protocol) servers, which lets you wire in custom tools and data sources using the same protocol that Claude Code uses internally. If you've already built MCP integrations for your own Claude Code setup, those can be reused here.

The official quickstart on platform.claude.com walks through the full create-agent/create-environment/start-session flow in seven SDKs plus raw curl. All Managed Agents API requests require the managed-agents-2026-04-01 beta header - the SDK sets this automatically.

What's Still Locked#

Two of the most-cited capabilities are not available in the public beta. They're in a separate "research preview" with gated access, meaning you have to apply and wait for approval.

Multi-agent coordination. Parallel task execution across multiple agents - the kind of fan-out that lets you decompose a big task into concurrent subtasks with results aggregated back - is research preview only. The public beta is strictly single-agent per session.

Self-evaluation loops. The ability for an agent to assess its own output, decide it's insufficient, and iterate without a human in the loop is also gated. What's available today is a single agent loop that goes idle when it decides the task is done. Self-evaluation and retry on quality criteria requires the research preview tier.

There's no published timeline for when these move to general availability. Anthropic's communications on this have been vague - "coming to more customers over time" is the level of specificity you're getting right now.

This matters because most of the compelling use cases in the launch coverage - autonomous research pipelines, self-correcting code generation, distributed agent teams - depend on one or both of these gated features.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Jun 10, 2026 • 9 min read

Codex in June 2026: What Changed Since the Spring Wave

Jun 10, 2026 • 9 min read

Codex Exec in CI: The Practical Guide to Headless OpenAI Agents

Jun 10, 2026 • 9 min read

Codex vs Claude Code in June 2026: The Fable 5 Era Rematch

Jun 10, 2026 • 9 min read

Pricing Reality#

The cost model has two components that stack on top of each other.

Cost Component	Rate	What It Covers
Model inference	Standard Claude token rates	Input/output tokens per session
Session infrastructure	$0.08 per session-hour	Sandbox runtime, regardless of token activity
Free tier	None	No free quota for Managed Agents

The $0.08/session-hour infrastructure cost (source) is billed for active agent runtime. If your session runs for 30 minutes, that's $0.04 in infrastructure on top of whatever you spent on tokens. For a high-volume pipeline running hundreds of sessions daily, this adds up quickly. A simple calculation: 500 sessions/day averaging 20 minutes each = 167 session-hours/day = $13.36/day in infrastructure costs alone, before tokens.

For bursty or low-volume workloads, the overhead is manageable. For continuous high-throughput pipelines, the cost math favors DIY infrastructure.

Lock-In Risk: Claude Only#

This is the constraint that matters most for teams thinking long-term. Managed Agents supports Claude models only. There's no provider abstraction, no ability to route sessions to GPT, Gemini, or a local model. The agent definition model field takes a Claude model ID and only a Claude model ID.

If your evaluation determines that a different model performs better for a specific task six months from now, you're not swapping it in. You're rebuilding your agent infrastructure on a different platform. For teams that want to stay model-agnostic or hedge against provider pricing changes, this is a material concern.

What Teams Are Actually Shipping#

Given the constraints, what's actually getting built with the available feature set?

The workloads that fit best are bounded, single-agent tasks where the value comes from reliable sandboxed execution rather than multi-agent coordination. Common patterns in the early adopter community include:

Code generation with verification. Prompt the agent to write a script, execute it in the sandbox, return output. The sandbox execution is the value-add - you get confirmation the code actually ran, not just that it looks plausible.
Document processing pipelines. Feed a batch of files into a session, extract structured data, write outputs. The persistence model handles large batches cleanly.
CI/CD integration. Versioned agent definitions with environment promotion fit naturally into deployment pipelines. Teams are using this to run agents in staging vs production environments with different tool scopes.
Structured data extraction. Anthropic's own testing reported task success improvements of up to 10 percentage points over standard prompting loops for structured file generation tasks.

None of these require multi-agent coordination or self-evaluation. They're useful, but they represent a fraction of the use cases the launch marketing suggested.

Open-Source Alternatives#

If model flexibility or self-hosting matter to your team, three alternatives are worth evaluating honestly.

Tool	Model Support	Self-Hostable	Sandboxing	Multi-Agent
Multica	Multi-model	Yes, fully	No container-level isolation	Yes
Cabinet	Multi-model	Yes, fully	No compute sandbox	Limited
CrewAI	Multi-model via LiteLLM	Yes	No managed sandbox	Yes
Claude Managed Agents	Claude only	Partial (self-hosted env)	Yes, container-level	Research preview only

Multica is the closest open-source analog to Managed Agents. It supports multiple models, includes a task and team management UI, and is fully self-hostable. The gap is that it lacks container-level sandboxing and the credential vault isolation that Managed Agents provides.

Cabinet adds persistent agent memory and scheduled recurring tasks - capabilities Managed Agents doesn't currently offer at all. The tradeoff is no compute sandbox; Cabinet manages memory and scheduling but not execution isolation.

CrewAI is the most mature multi-agent framework of the three. Model-agnostic via LiteLLM, with a hosted management option and a large integration ecosystem. If you need multi-agent coordination today without waiting for research preview access, CrewAI is the practical path.

Make vs Buy Decision Framework#

Use Managed Agents when:

Your workload is single-agent, bounded tasks
Sandboxed execution is the core requirement and you want managed infrastructure
You're already committed to Claude and don't need model flexibility
Compliance requirements benefit from Anthropic's enterprise data processing agreements
CI/CD-style versioned agent deployments fit your workflow

Consider alternatives when:

You need multi-agent coordination now, not on a waitlist
Model-agnostic infrastructure is a requirement
Cost at scale makes $0.08/session-hour prohibitive
You want self-evaluation and autonomous iteration immediately
Your team has the infrastructure capability to run sandboxed containers directly

The honest framing is this: Managed Agents is a good infrastructure product that happens to be marketed partly as a product it's not finished becoming yet. The sandboxing and session persistence are real and solid. The multi-agent orchestration is real but not available to most people reading this. If your main question is where recurring scheduled work should live, the comparison of Claude Code routines vs Managed Agents schedules settles that specific call.

FAQ#

Is Claude Managed Agents free to try?#

No. There is no free tier for Claude Managed Agents. You need an Anthropic Console account and API key, and all usage is billed at standard token rates plus $0.08 per session-hour for infrastructure.

Can I use GPT-5 or Gemini with Managed Agents?#

No. Claude Managed Agents supports Claude models only. There is no provider abstraction or model-swapping capability. If you need model flexibility, look at CrewAI or Multica instead.

What does "research preview" mean for multi-agent features?#

It means gated access with a separate application process and no committed timeline for general availability. You cannot currently access multi-agent coordination or self-evaluation in the standard public beta - these require explicit approval from Anthropic.

How does the $0.08/session-hour cost work in practice?#

It's billed on active session runtime, not token usage. A 30-minute session costs $0.04 in infrastructure fees plus whatever you spent on input and output tokens. Sessions that are idle but not terminated still accrue the session-hour charge.

Can I bring my own sandbox infrastructure?#

Yes. The environment configuration supports self-hosted sandboxes via the type: self-hosted option in the environment config. This can reduce infrastructure costs if you already run container infrastructure, though you take on the operational burden.

Is Managed Agents suitable for production workloads today?#

For single-agent, bounded tasks with sandboxed execution - yes. For multi-agent pipelines, autonomous self-evaluation, or model-agnostic requirements - not yet, or not without significant workarounds.

Official Sources#

Anthropic Managed Agents Quickstart - Official documentation with API reference and SDK examples
Claude Managed Agents: Honest Review - Community analysis of available vs gated features and pricing breakdown

Claude Mythos 5 Explained: What It Is, Who Can Access It, and Why It's Gated

Anthropic shipped two names for one architecture on June 9, 2026. Here is what separates Fable 5 from Mythos 5, who can actually get unrestricted access, and what developers should do right now.

7 min read

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and When Each Wins

Fable 5 launched June 9 at 2x GPT-5.5's price with a 22-point SWE-Bench Pro gap. Here is the decision framework for choosing between them.

7 min read

Claude Fable 5 Pricing: Real Cost Per Task vs Opus 4.8, GPT-5.5 and Codex

Fable 5 lists at $10/$50 per million tokens - twice Opus 4.8. But list price is the wrong number. Here is the cost-per-outcome math that actually decides whether the upgrade pays.

8 min read

Suggest an editSave

Discuss this article on Twitter/X

Developers Digest

Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.

300+ videos30K+ GitHub stars50+ articles

Subscribe YouTube GitHub Twitter/X

Related Tools

AI ModelsNew

Claude Fable 5

Anthropic's first generally available Mythos-class model, released June 9, 2026. 1M context, 128K max output, $10/$50 pe...

View Tool

ProductivityNew

AgentCanvas

A hosted infinite canvas your headless AI agents drive over MCP. Any MCP-speaking agent - Claude Code, Codex, Cursor, or...

View Tool

AI CodingDaily Driver

Claude Code

Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...

View Tool

AI FrameworksNew

Claude Agent SDK

Anthropic's Python SDK for building production agent systems. Tool use, guardrails, agent handoffs, and orchestration. R...

View Tool

Apps from Developers Digest

Developer ToolsIn Progress

Subagent Studio

Design subagents visually instead of editing YAML by hand.

View App

SaaS Products

Cron

Schedule jobs in plain English. See what ran, what broke, what's next.

View App

Developer Tools

Agent Hub

Every coding agent in one window. Stop alt-tabbing between Claude, Codex, and Cursor.

View App

Related Guides

Guide

Routines (Web) - Claude Code

Managed scheduling on Anthropic infrastructure with API and GitHub triggers.

Claude Code

Guide

Claude Code Setup Guide

Configure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.

AI Agents

Guide

AI Agent Frameworks Compared: LangGraph vs CrewAI vs Mastra vs CopilotKit

Deep comparison of the top AI agent frameworks - LangGraph, CrewAI, Mastra, CopilotKit, AutoGen, and Claude Code.

AI Agents

Build with the member tools

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Developers Digest•June 10, 2026•8 min read

ai-agents anthropic developer-tools infrastructure claude

The Fable 5 Moment

31 parts

Previous in seriesClaude Managed Agents: Dreaming, Outcomes, and Multi-Agent Orchestration Explained

Next in seriesDario Amodei Wants FAA-Style AI Regulation: Open Questions for Developers

TL;DR

Direct answer

Claude Managed Agents Public Beta: What's Actually Available vs What's Gated

Best for

Developers comparing real tool tradeoffs before choosing a stack.

Covers

Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.

Last updated: June 10, 2026

What Managed Agents Delivers Today#

The public beta ships four capabilities that work right now, without any waitlist.

What's Still Locked#

Two of the most-cited capabilities are not available in the public beta. They're in a separate "research preview" with gated access, meaning you have to apply and wait for approval.

Newsletter

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools, delivered free every week.

From the archive

How Claude's Usage Limits Actually Work With Fable 5: Windows, Multipliers, and Burn Rates

Jun 10, 2026 • 9 min read

Codex in June 2026: What Changed Since the Spring Wave

Jun 10, 2026 • 9 min read

Codex Exec in CI: The Practical Guide to Headless OpenAI Agents

Jun 10, 2026 • 9 min read

Codex vs Claude Code in June 2026: The Fable 5 Era Rematch

Jun 10, 2026 • 9 min read

Pricing Reality#

The cost model has two components that stack on top of each other.

Cost Component	Rate	What It Covers
Model inference	Standard Claude token rates	Input/output tokens per session
Session infrastructure	$0.08 per session-hour	Sandbox runtime, regardless of token activity
Free tier	None	No free quota for Managed Agents

For bursty or low-volume workloads, the overhead is manageable. For continuous high-throughput pipelines, the cost math favors DIY infrastructure.

Lock-In Risk: Claude Only#

What Teams Are Actually Shipping#

Given the constraints, what's actually getting built with the available feature set?

Code generation with verification. Prompt the agent to write a script, execute it in the sandbox, return output. The sandbox execution is the value-add - you get confirmation the code actually ran, not just that it looks plausible.
Document processing pipelines. Feed a batch of files into a session, extract structured data, write outputs. The persistence model handles large batches cleanly.
CI/CD integration. Versioned agent definitions with environment promotion fit naturally into deployment pipelines. Teams are using this to run agents in staging vs production environments with different tool scopes.
Structured data extraction. Anthropic's own testing reported task success improvements of up to 10 percentage points over standard prompting loops for structured file generation tasks.

None of these require multi-agent coordination or self-evaluation. They're useful, but they represent a fraction of the use cases the launch marketing suggested.

Open-Source Alternatives#

If model flexibility or self-hosting matter to your team, three alternatives are worth evaluating honestly.

Tool	Model Support	Self-Hostable	Sandboxing	Multi-Agent
Multica	Multi-model	Yes, fully	No container-level isolation	Yes
Cabinet	Multi-model	Yes, fully	No compute sandbox	Limited
CrewAI	Multi-model via LiteLLM	Yes	No managed sandbox	Yes
Claude Managed Agents	Claude only	Partial (self-hosted env)	Yes, container-level	Research preview only

Make vs Buy Decision Framework#

Use Managed Agents when:

Your workload is single-agent, bounded tasks
Sandboxed execution is the core requirement and you want managed infrastructure
You're already committed to Claude and don't need model flexibility
Compliance requirements benefit from Anthropic's enterprise data processing agreements
CI/CD-style versioned agent deployments fit your workflow

Consider alternatives when:

You need multi-agent coordination now, not on a waitlist
Model-agnostic infrastructure is a requirement
Cost at scale makes $0.08/session-hour prohibitive
You want self-evaluation and autonomous iteration immediately
Your team has the infrastructure capability to run sandboxed containers directly