Where Should Your AI Agent Run Code: E2B vs Daytona vs Modal vs Cloudflare vs Vercel Sandbox

Your AI agent can reason about code. The harder question is where that code actually runs.

A year ago, most agent frameworks executed generated code in a local subprocess or a throwaway Docker container. That worked when agents ran short scripts. It breaks down when agents need to run for hours, install arbitrary dependencies, persist state between steps, or operate in production with real users and real data. The execution environment is now a first-class architecture decision, and a new category of sandbox-as-a-service providers has emerged to solve it.

This guide compares five providers that offer isolated code execution designed for AI agents: E2B, Daytona, Modal, Cloudflare Sandbox SDK, and Vercel Sandbox. Each takes a different approach to isolation, persistence, latency, and pricing. The right choice depends on your agent's run length, your existing cloud stack, and how much state your workflows need to carry between steps.

E2B

E2B provides on-demand Linux VMs purpose-built for AI agent code execution. Each sandbox is described in their docs as "a fast, secure Linux VM" that you create, run code in, and tear down or pause programmatically (E2B docs).

Isolation model. Each sandbox runs as an isolated VM. The specific virtualization technology is not disclosed in their public documentation, but the persistence model (which saves both filesystem and memory state) is consistent with VM-level isolation rather than shared-kernel containers (E2B persistence docs).

State and persistence. E2B offers the most granular persistence model of the group. You can pause a sandbox (saving filesystem and memory), resume it in approximately 1 second, create named snapshots to fork new sandboxes from a running state, and mount persistent volumes that survive across sandbox lifetimes. Paused sandboxes are kept indefinitely with no automatic TTL. An auto-pause option lets you configure sandboxes to pause instead of terminate on timeout (E2B persistence docs).

Cold-start and latency. Resume from pause is documented at approximately 1 second. Pause takes approximately 4 seconds per 1 GiB of RAM. Templates support a start command that pre-warms processes during build, so sandboxes created from a template have processes "already running" (E2B template docs). Fresh sandbox creation latency is not published as a specific number.

Pricing model. Three tiers: Hobby (free with one-time credits), Pro ($150/month base), and Ultimate (enterprise custom). All tiers charge per-second usage on top of the base fee, metered by vCPU and RAM. A pricing calculator is available at pricing.e2b.dev.

SDK support. Python and TypeScript SDKs, plus a separate Code Interpreter package that runs code in a Jupyter context supporting Python, JavaScript, TypeScript, Bash, Java, and R (E2B docs).

Best for. Teams that need deep state persistence (pause, resume, snapshot, fork) and want the most mature agent-framework integration ecosystem. E2B documents integrations with LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and OpenAI Agents SDK (E2B integrations).

Daytona

Daytona positions itself as "AI-first infrastructure optimized for LLMs, agents, and evals." Each sandbox is a full composable environment with a dedicated kernel, filesystem, network stack, and allocated compute resources (Daytona docs).

Isolation model. Each sandbox gets a dedicated kernel, filesystem, and network stack. The enterprise tier adds customer-managed compute in your own cloud with no shared compute and no cross-tenant risk. Docker-in-Docker, Dockerfiles, and Docker Compose are supported natively (Daytona docs).

State and persistence. Sandboxes are stateful by design and can run indefinitely. Daytona supports environment snapshots (save, restore, and resume any agent workflow), shared volumes across sandboxes, and external storage mounts. The product describes this as "unlimited persistence" (Daytona homepage).

Cold-start and latency. Daytona claims sub-90ms sandbox creation on their homepage and docs. Regional deployment is available across US East, US West, EU Central, EU West, and Asia South (Daytona docs).

Pricing model. Pure pay-as-you-go with per-second billing. Rates are published per vCPU-hour, per GiB RAM-hour, and per GiB storage-hour. GPU options (NVIDIA H100, RTX PRO 6000) are available at hourly rates. New accounts get free compute credits without a credit card. A startup program offers up to $50K in credits (Daytona pricing).

SDK support. Five SDKs: Python, TypeScript, Ruby, Go, and Java. Also provides a RESTful API with OpenAPI spec, a Toolbox API, a CLI, and an MCP server for agent tool integration (Daytona docs).

Best for. Teams that need the broadest SDK language coverage, GPU sandbox access, long-running stateful agents, or enterprise BYOC (bring your own compute) deployments. Daytona also offers computer-use capabilities with virtual desktops (Linux, macOS, Windows) controllable via code (Daytona docs).

Modal is a serverless cloud platform with a dedicated Sandbox API for executing untrusted user or agent code. Modal reports over 1 billion sandboxes run on their platform, designed for production agent systems and reinforcement learning training at scale (Modal sandboxes).

Isolation model. Modal uses gVisor, the user-space kernel developed at Google, for containerization and virtualization. gVisor intercepts system calls, providing stronger isolation than standard Linux containers without the overhead of full VMs. Modal also runs continuous synthetic monitoring to verify network and application isolation within their runtime (Modal security docs).

State and persistence. Multiple persistence primitives are available: distributed volumes (persistent filesystem mountable across runs), filesystem snapshots (full sandbox state, retained for 30 days by default), directory snapshots, memory snapshots (7-day retention), distributed key-value dicts, and queues. For sandboxes exceeding the 24-hour maximum lifetime, Modal recommends snapshotting and restoring into a new sandbox (Modal sandbox docs).

Cold-start and latency. Modal claims "sub-second scheduling" for sandboxes with strong cold-start performance on custom images. The sandbox lifecycle moves through Created, Scheduled, Started, and optionally Ready stages, with configurable readiness probes (Modal sandbox docs).

Pricing model. Pure usage-based with per-second billing. Three tiers: Starter ($0 base with $30/month in free credits), Team ($250/month base with $100/month included), and Enterprise (custom). Sandbox compute is priced separately from standard function compute. GPU access ranges from T4 to B200 at per-second rates. Region selection and non-preemptible execution carry multipliers (Modal pricing).

SDK support. Python (primary, most mature), JavaScript/TypeScript, and Go SDKs. The Python SDK has the most complete sandbox API coverage (Modal sandbox docs).

Best for. Teams already on Modal for serverless compute who want sandbox execution as a natural extension. Strong fit for RL training environments and large-scale batch agent workloads. Modal documents integration examples with LangGraph and supports up to 100K+ concurrent sandboxes (Modal docs).

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Claude Sonnet 5 vs Sonnet 4.6: Should You Upgrade?

Jul 1, 2026 • 6 min read

Cursor Composer 2.5 Developer Guide 2026

Jul 1, 2026 • 8 min read

We Redesigned Developers Digest: The Applied Story of Rebuilding a 1000-Page Site in a Day

Jul 1, 2026 • 8 min read

Orchestrating a Fleet of Agents with Fable 5

Jul 1, 2026 • 8 min read

Cloudflare Sandbox SDK

Cloudflare Sandbox SDK provides isolated code execution environments built on top of Cloudflare Containers and Workers. It is explicitly positioned for AI agents that need to execute code, interactive dev environments, and CI/CD systems (Cloudflare Sandbox docs).

Isolation model. Each sandbox runs in its own VM with an Ubuntu Linux container inside. The architecture is three-layer: Workers handle application logic, Durable Objects provide persistent sandbox identity and routing, and Containers provide the isolated Linux environment where code runs. Isolation covers filesystem, process, network, and resource limits per sandbox (Cloudflare architecture docs).

State and persistence. Ephemeral by default. State (files, processes, shell sessions) persists only while the container is active. After an idle timeout (default 10 minutes), the container stops and all state is lost. A keepAlive option prevents idle sleep with heartbeat pings. S3-compatible storage (R2, S3, GCS) can be mounted as local filesystems for data that persists across sandbox lifecycles. Durable Objects provide the persistent identity layer, but the container filesystem itself does not survive restarts (Cloudflare sandbox concepts).

Cold-start and latency. No specific cold-start numbers are published. Cloudflare's broader platform marketing claims "no cold starts or region complexity," but the sandbox-specific docs do not quantify startup latency. A WebSocket transport option reduces overhead for high-frequency operations by multiplexing SDK calls over a single persistent connection (Cloudflare sandbox docs).

Pricing model. Usage-based, built on Cloudflare Containers pricing. Requires the $5/month Workers Paid plan. Billing is per 10ms of active running time across memory, CPU, and disk dimensions. Instance types range from lite (1/16 vCPU, 256 MiB RAM) to standard-4 (4 vCPU, 12 GiB RAM). Network egress is metered separately (Cloudflare sandbox pricing).

SDK support. TypeScript only for the SDK (@cloudflare/sandbox npm package). Inside the sandbox, Python and Node.js/JavaScript execution is supported via dedicated Docker images. A code interpreter API provides automatic result capture for Python and JS (Cloudflare sandbox docs).

Best for. Teams already deep in the Cloudflare ecosystem (Workers, Durable Objects, R2, AI Gateway) who want sandbox execution without adding another vendor. The credential proxy pattern (Worker injects secrets at request time so the sandbox never holds live keys) is a thoughtful security design for agent workflows (Cloudflare security docs).

Vercel Sandbox

Vercel Sandbox is a compute primitive for running arbitrary code in isolated, ephemeral Linux VMs. It went generally available on January 30, 2026, and is explicitly positioned as "the execution layer for agents" (Vercel Sandbox GA blog).

Isolation model. Firecracker microVMs with a dedicated kernel per sandbox. Vercel explicitly contrasts this with Docker containers: each sandbox gets kernel-level isolation, a dedicated private filesystem, network namespace isolation, and strict CPU/memory/disk limits. The underlying infrastructure (internally called "Hive") is the same system that handles Vercel's core deployment platform (Vercel Sandbox concepts).

State and persistence. Persistent sandboxes are the default. When a sandbox stops, the SDK automatically snapshots its filesystem. Resuming starts a new session from that snapshot. The model is two-level: a Sandbox is a long-lived named entity, and a Session is a single running VM instance. Calling runCommand on a stopped sandbox auto-resumes it. Snapshots expire 30 days after last use by default. Drives (beta) provide attachable persistent storage reusable across sandbox runs. Lifecycle hooks (onCreate, onResume) handle setup automation (Vercel persistent sandboxes).

Cold-start and latency. Vercel claims sandboxes start in milliseconds, with sub-second starts for thousands of sandboxes per task. Resuming from a snapshot is described as faster than starting fresh. The Firecracker-based infrastructure is optimized for fast boot (Vercel Sandbox docs).

Pricing model. Usage-based, metered across active CPU, provisioned memory, creations, data transfer, and snapshot storage. A key detail: active CPU billing excludes time waiting for I/O (network calls, database queries, AI model calls), so agents that spend time waiting on LLM responses are not billed for that idle time. Hobby tier includes free monthly quotas. Pro tier charges per-unit rates with a $20/month included credit. Maximum runtime is 45 minutes on Hobby and 24 hours on Pro/Enterprise (Vercel Sandbox pricing).

SDK support. JavaScript/TypeScript SDK (@vercel/sandbox), Python SDK (vercel.sandbox), and an open-source CLI. Available runtimes include Node.js (versions 22, 24, 26) and Python 3.13. Custom images are supported via Vercel Container Registry. Full sudo access is available inside sandboxes (Vercel Sandbox docs).

Best for. Teams already deploying on Vercel who want sandbox execution tightly integrated with their existing platform. Strong fit for AI coding agents and "vibe coding" platforms. Vercel's own agent framework (eve) uses Sandbox as a built-in primitive, and customers like Notion, Conductor, and Blackbox AI use it for production agent workloads (Vercel blog).

Comparison Summary

Dimension	E2B	Daytona	Modal	Cloudflare Sandbox	Vercel Sandbox
Isolation	VM (type undisclosed)	Dedicated kernel	gVisor (user-space kernel)	VM with Ubuntu container	Firecracker microVM
Persistence	Deep (pause, resume, snapshot, volumes, indefinite)	Stateful by design, snapshots, volumes	Volumes, filesystem/memory/directory snapshots	Ephemeral by default, bucket mounts for durability	Persistent by default, auto-snapshot, drives (beta)
Max runtime	Up to 24 hours (Pro)	Unlimited	24 hours (then snapshot and restore)	Until idle timeout (configurable)	24 hours (Pro)
Startup claim	~1s resume	Sub-90ms creation	Sub-second scheduling	Not published	Milliseconds
SDK languages	Python, TypeScript	Python, TypeScript, Ruby, Go, Java	Python, TypeScript, Go	TypeScript only	TypeScript, Python
GPU support	Not documented	H100, RTX PRO 6000	T4 through B200	Not documented	Not documented
Pricing model	Base fee + per-second usage	Pure per-second pay-as-you-go	Per-second usage, tiered base	$5/mo base + per-10ms usage	Per-use metering, I/O wait excluded

How to Choose

By isolation requirements. If your agent runs untrusted code from end users and you need the strongest possible isolation boundary, Vercel Sandbox (Firecracker microVMs with dedicated kernels) and Modal (gVisor with continuous isolation monitoring) both offer well-documented security models. Daytona's enterprise tier adds customer-managed compute for zero cross-tenant risk.

By run length and statefulness. If your agents run for hours and need to carry state between steps, E2B's pause/resume/snapshot model is the most granular. Daytona offers unlimited persistence by design. Vercel Sandbox defaults to persistent sandboxes with automatic snapshotting. Cloudflare Sandbox is the most ephemeral of the group and requires explicit bucket mounts for durable state.

By existing cloud stack. If you are already on Cloudflare (Workers, Durable Objects, R2), the Sandbox SDK keeps everything in one vendor and one billing relationship. If you deploy on Vercel, Vercel Sandbox integrates natively with your existing infrastructure and the eve agent framework. If you use Modal for serverless compute, their Sandbox API is a natural extension. E2B and Daytona are cloud-neutral and work from any backend.

By SDK and language needs. Daytona offers the widest SDK coverage (five languages). If your agent framework is in Ruby, Go, or Java, Daytona is currently the only option with a first-party SDK. For TypeScript-first teams, all five providers have you covered. For Python-heavy ML and agent stacks, E2B, Daytona, and Modal all offer mature Python SDKs.

By GPU requirements. If your agent needs GPU access inside the sandbox (for local model inference, RL training, or image generation), Modal and Daytona both offer GPU instances. The other three providers do not currently document GPU support for sandbox workloads.

Frequently Asked Questions

What is the difference between a sandbox and a regular container for AI agents?

A standard Docker container shares the host kernel and relies on namespaces and cgroups for isolation. A sandbox for AI agents typically provides stronger isolation (dedicated kernel, microVM, or user-space kernel like gVisor), automatic lifecycle management (create, pause, resume, snapshot), and APIs designed for programmatic control from an agent orchestration layer. The key difference is that sandboxes are built to safely run untrusted, agent-generated code without risking the host infrastructure or other tenants (Vercel Sandbox concepts, Modal security).

Can I self-host any of these sandbox providers?

Daytona offers a bring-your-own-compute option at the enterprise tier where sandboxes run in your own cloud with no shared compute. Modal offers a self-hosted option for enterprise customers. E2B documents a BYOC (bring your own cloud) capability. Cloudflare Sandbox and Vercel Sandbox are managed services tied to their respective platforms and do not currently offer self-hosted options. Check each provider's enterprise documentation for current self-hosting details.

How do these sandboxes handle secrets and credentials?

Approaches vary. Cloudflare Sandbox documents a credential proxy pattern where the Worker injects secrets at request time so the sandbox itself never holds live API keys (Cloudflare security). Vercel offers Vercel Connect for scoped, short-lived tokens to services like GitHub and Slack (Vercel blog). E2B and Daytona support environment variables passed at sandbox creation. For any provider, the best practice is to avoid baking long-lived secrets into sandbox images and instead use a proxy or injection pattern.

Do these providers integrate with popular agent frameworks?

E2B documents the broadest set of agent framework integrations, including LangChain, LlamaIndex, CrewAI, Vercel AI SDK, and OpenAI Agents SDK (E2B integrations). Daytona provides integration guides for LangChain and an MCP server for tool integration (Daytona docs). Modal documents examples with LangGraph (Modal docs). Vercel Sandbox integrates natively with Vercel's eve framework and AI SDK. Cloudflare Sandbox integrates with Workers AI. Most providers can work with any agent framework through their SDK, even without a dedicated integration guide.