
TL;DR
Vercel AI Gateway gives you one API key and string model ids like moonshotai/kimi-k2.5 for hundreds of models. Here is how it works with the AI SDK, what BYOK and OIDC change, the honest tradeoffs, and who should actually use it.
Most "add a new model" tasks are boring in the worst way. You install another SDK, wire up another API key, learn another auth quirk, and rebuild your fallback logic per provider. Vercel AI Gateway collapses that into one key and one endpoint. You reference a model by a plain string like moonshotai/kimi-k2.5 or anthropic/claude-opus-4.8, and the request routes to the right provider.
We run production chat through it, so this is the applied version: what it is, how it plugs into the AI SDK, what BYOK and OIDC actually buy you, the tradeoffs worth knowing before you couple to it, and who should reach for it.
The AI Gateway is a single HTTP endpoint (https://ai-gateway.vercel.sh/v1) that fronts hundreds of models from many providers. Instead of one client per provider, you get:
creator/model-name string ids.It works with the AI SDK v5 and v6, the OpenAI Chat Completions and Responses APIs, and the Anthropic Messages API, so most existing code paths have a compatible entry point.
The fastest path is the AI SDK, where the gateway is the default provider when you pass a model as a plain string. Set one environment variable:
AI_GATEWAY_API_KEY=your_key_here
Then reference any model by string. No provider import, no per-provider client:
import { generateText } from 'ai';
const { text } = await generateText({
model: 'anthropic/claude-opus-4.8',
prompt: 'What is the capital of France?',
});
Swapping models is a one-line change. anthropic/claude-opus-4.8 becomes moonshotai/kimi-k2.5 or openai/gpt-5.5 and nothing else moves. When you want explicit control, the gateway() provider instance gives you the same routing with configurable base URLs and env vars, which matters behind a corporate proxy.
Prefer the OpenAI SDK? Point its base_url at the gateway and keep your existing code:
from openai import OpenAI
client = OpenAI(
api_key=os.getenv('AI_GATEWAY_API_KEY'),
base_url='https://ai-gateway.vercel.sh/v1'
)
Here is the applied lesson that is not obvious from the quickstart. As tooling standardizes on OpenAI's newer Responses API, many OpenAI-compatible clients now default to it rather than the older Chat Completions shape. That is fine when you talk to OpenAI. It breaks when you point the same client at an upstream that only serves Chat Completions. Calling Moonshot's endpoint directly, for example, we hit exactly this shape mismatch: the client spoke Responses, the upstream spoke Completions, and requests failed in ways that looked like our bug.
Routing through the gateway is what made this stop being our problem. The gateway normalizes the request and response shapes across providers, so a single client format reaches models that natively expose different APIs. If you have ever burned an afternoon on a "why does this provider return a different envelope" bug, this abstraction is the quiet reason to adopt a gateway even before you care about routing or spend.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Jul 1, 2026 • 5 min read
Jun 30, 2026 • 7 min read
Jun 30, 2026 • 8 min read
Jun 30, 2026 • 6 min read
Bring Your Own Key lets you attach your own provider credentials at the team level. It is useful when you:
The reliability twist worth knowing: if your own credentials fail on a request, the gateway can retry with system credentials so the call still succeeds, and that fallback usage is billed against your gateway credits. See the BYOK docs for setup.
For apps deployed on Vercel, you do not have to manage a gateway API key at all. An OIDC token is automatically available as VERCEL_OIDC_TOKEN, so there is nothing to rotate and no secret to leak. A common pattern reads OIDC in production and falls back to a key locally:
// OIDC on Vercel, API key locally
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
There is a related operational nicety: standard API keys never expire unless revoked, and when a teammate leaves, Vercel deactivates the keys they created. For automation that should not be tied to a person, OIDC is the cleaner path.
Per the pricing docs, every team gets a free tier and a paid tier:
429 to retry after a short wait. BYOK is not available on free.The headline is that the gateway does not mark up tokens. It monetizes through purchased credits and a handful of optional capabilities (things like team-wide provider allowlists or zero-data-retention) that carry small per-request fees only when you enable them. Verify current numbers on the pricing page before you model costs, since credit and capability details change.
A gateway is not free of cost in the engineering sense. Weigh these before coupling to it:
When any of these dominate, go direct for that specific path and keep the gateway for everything else. It does not have to be all-or-nothing.
Reach for the AI Gateway when you:
Stay direct when you have a single provider you will never leave, a latency budget so tight that a proxy hop is unacceptable, or a need for provider-specific features the moment they ship. For most teams building AI features in 2026, the gateway removes more friction than it adds, and the migration cost is genuinely a few lines.
Models are referenced as creator/model-name strings, for example anthropic/claude-opus-4.8, moonshotai/kimi-k2.5, or openai/gpt-5.5. In the AI SDK, passing that string as the model automatically routes through the gateway. See the models and providers docs.
No. Per Vercel's docs, tokens are billed at provider list rates with zero markup, including with BYOK. The paid tier is funded through purchased credits rather than a per-token surcharge.
Not necessarily. Vercel deployments get an OIDC token as VERCEL_OIDC_TOKEN automatically, so there is nothing to rotate. Locally you fall back to an AI_GATEWAY_API_KEY.
Yes, through BYOK on the paid tier. Your credentials are configured at the team level, and if they fail on a request the gateway can retry with system credentials, billed to your credits.
Yes. The gateway is compatible with OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages, plus the AI SDK v5 and v6. Point the base URL at https://ai-gateway.vercel.sh/v1 and keep most existing code.
Today the gateway focuses on text generation and embeddings. For text-to-speech you generally still call the provider directly. See our TTS API guide for that side of the stack.
Read next
Envoy AI Gateway 1.0 is production-ready. The useful question for builders is when an Envoy-based LLM gateway beats direct SDK calls, LiteLLM, OpenRouter, or a hosted AI gateway.
8 min readThe trending Free Claude Code repo is not just about avoiding API bills. It points at a bigger developer-tool pattern: model gateways for AI coding agents.
7 min readThe AI SDK is the fastest way to add streaming AI responses to your Next.js app. Here is how to use it with Claude, GPT, and open source models.
5 min readTechnical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
The TypeScript toolkit for building AI apps. Unified API across OpenAI, Anthropic, Google. Streaming, tool calling, stru...
View ToolUnified API for 200+ models. One API key, one billing dashboard. OpenAI, Anthropic, Google, Meta, Mistral, and more. Aut...
View ToolVercel's generative UI tool. Describe a component, get production-ready React code with shadcn/ui and Tailwind. Iterate...
View ToolDeployment platform behind Next.js. Git push to deploy. Edge functions, image optimization, analytics. Free tier is gene...
View ToolAdd gateway or custom models to the picker via environment variables.
Claude CodeInstall Ollama and LM Studio, pull your first model, and run AI locally for coding, chat, and automation - with zero cloud dependency.
Getting StartedPersistent project instructions loaded every session; supports nested dirs.
Claude Code
Repo: https://git.new/ai-pin Building an AI Assistant similar to the Humane AI Pin, the Rabbit R1 with Advanced Functionality from Scratch This video details the process of creating an AI...

Building a Perplexity Style LLM Answer Engine: Frontend to Backend Tutorial This tutorial guides viewers through the process of building a Perplexity style Large Language Model (LLM) answer...

In this video, I dive deep into the Groq Inference API, which I've found to be the fastest inference API out there. I share my insights on the various approaches to leveraging this API, focusing...

Envoy AI Gateway 1.0 is production-ready. The useful question for builders is when an Envoy-based LLM gateway beats dire...

The trending Free Claude Code repo is not just about avoiding API bills. It points at a bigger developer-tool pattern: m...

The AI SDK is the fastest way to add streaming AI responses to your Next.js app. Here is how to use it with Claude, GPT,...

A code-heavy field guide to model routing. Real, runnable-style configs for tiering tasks by complexity, routing simple...

A $500M accidental Claude bill and an open-weights model beating GPT-5.5 at one-sixth the cost point to the same conclus...

At Vercel Ship 26 in London on June 17, 2026, Vercel shipped a wave of agent-era tooling: the open-source eve agent fram...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.