Vercel's Agentic Infrastructure Stack Explained

Vercel Picked a Side

For two years the conversation about what an "agent stack" actually means has been a list of vendors and a vague hand-wave. LangChain for orchestration, OpenAI for inference, some vector DB, some queue, some place to run untrusted code, some flagging system bolted on after the first incident. Every team rebuilt the same plumbing.

For the larger agent workflow map, read AI Agents Explained: A TypeScript Developer's Guide and How to Build AI Agents in TypeScript; they give the architecture and implementation context this piece assumes.

Vercel's agentic infrastructure announcement is the first time a major platform has named the primitives explicitly and shipped them as a coherent stack. Four pieces: AI Gateway for model routing, Sandbox for code execution, Flags for runtime control, and Microfrontends for composing UIs that agents render into. None of these are individually novel. The bet is that you want them as one platform with one auth model, one observability surface, and one billing line.

This post is a working developer's read of what was announced, what it looks like in code, and where the seams are. I am skeptical of platform consolidation as a default but I think Vercel got the abstractions mostly right, and the parts they got wrong are the parts you can route around.

The Four Primitives

AI Gateway is a model-router with a single OpenAI-compatible endpoint. Point your SDK at https://gateway.ai.vercel.com/v1, pass a model identifier like anthropic/claude-opus-4.7 or openai/gpt-5.3, and get back a response. The gateway handles failover, caching, rate limit smoothing across keys, and per-request cost tracking. You can also define routing policies - for example, "route reasoning-heavy prompts to opus, route summarization to haiku" - without rewriting your application code.

Sandbox is a microVM-backed code execution environment with a Node-like API. You hand it a snippet of Python, JavaScript, or shell, and it runs in an isolated VM with file system, network egress controls, and a 60-second to 30-minute lifetime depending on plan. This is the primitive every coding agent has been hand-rolling on top of Firecracker, E2B, or Modal. Vercel collapsed it.

Flags is the lightweight feature flag service Vercel has been quietly building for two years, now positioned as the runtime control plane for agents. Toggle which model an agent uses, which tools it can call, which prompt template applies to which user, all from a dashboard, all evaluated at the edge. There is no SDK weight beyond a tree-shakeable function call.

Microfrontends lets you compose UI from independently deployed apps. The agent angle is that an agent can render a generated UI fragment from a separate deployment without taking over the whole page. Think generative UI scoped to a region of your existing product.

What It Looks Like in Code

The minimal agent that uses three of the four primitives. AI Gateway for the model call, Sandbox for tool execution, Flags for the kill switch.

import { generateText } from "ai";
import { gateway } from "@vercel/ai-gateway";
import { Sandbox } from "@vercel/sandbox";
import { flag } from "@vercel/flags";

const codeExecEnabled = flag({
  key: "agent_code_exec",
  defaultValue: false,
});

export async function runAgent(userPrompt: string, userId: string) {
  const model = await flag({
    key: "agent_model",
    defaultValue: "anthropic/claude-sonnet-4.7",
  })();

  const result = await generateText({
    model: gateway(model),
    system: "You are a code agent. Use the run_code tool when needed.",
    prompt: userPrompt,
    tools: {
      run_code: {
        description: "Execute Python in a sandbox",
        parameters: { code: "string" },
        execute: async ({ code }) => {
          if (!(await codeExecEnabled({ user: userId }))) {
            return { error: "Code execution disabled for this user" };
          }
          const sandbox = await Sandbox.create({ runtime: "python3.12" });
          const out = await sandbox.exec(code, { timeout: 30_000 });
          await sandbox.destroy();
          return out;
        },
      },
    },
  });

  return result.text;
}

A few things worth pointing out.

The gateway(model) call returns a model object that the Vercel AI SDK already understands. There is no separate fetch client to manage. If the upstream provider 503s, the gateway transparently fails over to your configured fallback, which is set in the dashboard rather than in code. That is the right place for it because failover policy is an ops concern, not a code concern.

The flag evaluation happens at the edge with a typical latency of single-digit milliseconds. You can target by user ID, geography, cohort, or anything in the request. The agent_model flag in the example lets you do canary rollouts of new model versions to 5% of users without a deploy.

The Sandbox lifecycle is explicit. You create, you exec, you destroy. There is no ambient pool, which is good for predictability and bad if you are running thousands of short executions per second. For high-volume cases there is a Sandbox.persistent API that keeps a warm pool, but you pay for it.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Agent Replays with TraceTrail: Loom for Agent Runs

Apr 28, 2026 • 7 min read

Best Claude Code Skills in 2026: A Curated Directory

Apr 28, 2026 • 10 min read

An Agent SDK Triage Bot for Commercial Insurance Submissions

Apr 28, 2026 • 12 min read

Claude Code as an HL7 to FHIR Migration Agent for Hospitals

Apr 28, 2026 • 11 min read

The Gotchas

Pricing is not free of routing logic. AI Gateway adds a small markup on token costs and a per-request fee on top. For a high-volume product the math can flip the other way against direct provider keys. Run the numbers before you migrate everything.

Sandbox cold starts are real. First execution of a runtime image is 600-900 ms. Subsequent executions in the same sandbox are sub-100 ms. If your agent calls a tool once per turn and turns are infrequent, you eat the cold start every time. The persistent pool is worth it when execution rate exceeds about 1 per minute per user.

Flags evaluated client-side leak. If you ship a Next.js page that reads a flag in the browser, the flag value is in the response. Use server-side evaluation for anything sensitive - model selection, tool gating, anything cost-related. The SDK supports both modes; pick the right one consciously.

Microfrontends has the smallest agent story right now. It is genuinely useful for partitioning team ownership of a UI but the "agent renders a fragment" use case is more hype than substance today. There is no first-class generative UI primitive in the announcement; you can build one on top, but you are building it.

Where It Fits in the Agent Stack

The right way to read this announcement is as a redrawing of the seams. Vercel is saying: model calls go through Gateway, code goes through Sandbox, behavior is controlled by Flags, UI is composed by Microfrontends. Everything else - orchestration, memory, evals, observability - is left to other tools or to your application code.

That is a defensible split. Orchestration frameworks (LangGraph, Mastra, the AI SDK itself) sit on top of Gateway. Memory layers (Mem0, Letta, your own pgvector) live alongside. Evals (Braintrust, Langfuse) consume the gateway's request logs. The platform takes the parts that have to be infrastructure and leaves the parts that benefit from competition.

Where I think it gets interesting for indie devs is the MCP angle. The natural complement to AI Gateway is a hosted MCP server registry - a place where you publish tools your agents can use, with auth and rate limits and observability. That is exactly what we built MCPaaS for: deploy an MCP server in one command, get a public endpoint, plug it into any agent runtime including Vercel's. The two stacks compose cleanly because Gateway treats MCP tools the same as any other tool call.

The other adjacent need is filesystem state for agents. Sandbox gives you ephemeral compute, but agents that work on files for hours need persistence and addressability. AgentFS is the DD product for that - a virtual filesystem with versioning that any sandbox or agent can mount. Vercel does not solve this problem and arguably should not; it is a different shape than what Sandbox is for.

I walked through the full stack composition on the Developers Digest YouTube channel including a live build of an agent that uses all four primitives plus an external MCP server.

Wiring It Into a Real Product

A working pattern that has held up across three production agents I have shipped.

Define your model selection as a flag from day one. Even if you only have one model, wrap it. The day you want to A/B a new release or fail over to a cheaper model during a billing emergency you will be glad you did.

Treat Sandbox as the only place untrusted code runs, including code generated by your own agents. The temptation to "just eval this small Python snippet" in your Node process is the temptation that ends careers. The Sandbox primitive is cheap enough that there is no excuse.

Log every Gateway request to your own store, not just Vercel's. The gateway has good observability but vendor logs are not your logs. Pipe them into whatever you use for production telemetry. We pipe ours into a Postgres table partitioned by day and it is the single most useful debugging tool we have.

Use Flags for incident response, not just feature releases. When a model provider is degraded at 3am, the move is to flip a flag that routes around it, not to push a deploy. Build that muscle when nothing is on fire.

What To Watch Next

The open question is whether Vercel adds an opinionated orchestration layer. Right now they are deliberately neutral - the AI SDK is provider-agnostic, the gateway is router-only, and the rest is up to you. That is the right call for adoption but it leaves a gap that someone will fill. Either Vercel ships a Mastra-style framework as a first-party product or a third party becomes the default on top of the stack.

The other thing to watch is pricing on Sandbox at scale. Microvm execution is genuinely expensive infrastructure. Either the price comes down as utilization improves or the high-volume case migrates to Modal, E2B, or self-hosted Firecracker. Vercel's bet is that the convenience of one platform will hold most workloads inside it.

Either way, the abstraction is now named. If you are designing an agent stack in 2026, these are the four boxes to start from. You can swap the implementations later. You cannot easily swap the architecture.

Vercel's New Durable Execution Programming Model: A Developer's Guide

Flagship: Cloudflare Feature Flags for AI Apps

How I'm Building 24 AI-Powered Apps in Parallel

Vercel Picked a Side

The Four Primitives

What It Looks Like in Code

Agent Replays with TraceTrail: Loom for Agent Runs

Best Claude Code Skills in 2026: A Curated Directory

An Agent SDK Triage Bot for Commercial Insurance Submissions

Claude Code as an HL7 to FHIR Migration Agent for Hospitals

The Gotchas

Where It Fits in the Agent Stack

Wiring It Into a Real Product

What To Watch Next

Comments

Related Tools

Obscura

Turborepo

Claude Code

v0

Apps from Developers Digest

Overnight Agents

Agent Eval Bench Plus

agentfs

Related Guides

MCP Servers Explained

Claude Code Setup Guide

Building Your First MCP Server

Related Posts

Vercel's New Durable Execution Programming Model: A Developer's Guide

Flagship: Cloudflare Feature Flags for AI Apps

How I'm Building 24 AI-Powered Apps in Parallel

Claude Code Loops: Recurring Prompts That Actually Run

OpenAI's GPT 5.4 in 10 Minutes

Claude Code: Remote Control, Auto Memory, Plugins & More

Get Smarter About AI Dev

Vercel's New Durable Execution Programming Model: A Developer's Guide

Flagship: Cloudflare Feature Flags for AI Apps

How I'm Building 24 AI-Powered Apps in Parallel

Vercel Picked a Side

The Four Primitives

What It Looks Like in Code

Agent Replays with TraceTrail: Loom for Agent Runs

Best Claude Code Skills in 2026: A Curated Directory

An Agent SDK Triage Bot for Commercial Insurance Submissions

Claude Code as an HL7 to FHIR Migration Agent for Hospitals

The Gotchas

Where It Fits in the Agent Stack

Wiring It Into a Real Product

What To Watch Next

Comments

Related Tools

Obscura

Turborepo

Claude Code

v0

Apps from Developers Digest

Overnight Agents

Agent Eval Bench Plus

agentfs

Related Guides

MCP Servers Explained

Claude Code Setup Guide

Building Your First MCP Server

Related Posts

Vercel's New Durable Execution Programming Model: A Developer's Guide

Flagship: Cloudflare Feature Flags for AI Apps

How I'm Building 24 AI-Powered Apps in Parallel

Claude Code Loops: Recurring Prompts That Actually Run

OpenAI's GPT 5.4 in 10 Minutes

Claude Code: Remote Control, Auto Memory, Plugins & More

Get Smarter About AI Dev