Vercel AI Gateway in 10 Minutes: One Key for Every Model

Q: What model id format does the AI Gateway use?

Models are referenced as `creator/model-name` strings, for example `anthropic/claude-opus-4.8`, `moonshotai/kimi-k2.5`, or `openai/gpt-5.5`. In the AI SDK, passing that string as the `model` automatically routes through the gateway. See the [models and providers docs](https://vercel.com/docs/ai-gateway/models-and-providers).

Q: Does the gateway mark up token costs?

No. Per Vercel's [docs](https://vercel.com/docs/ai-gateway), tokens are billed at provider list rates with zero markup, including with BYOK. The paid tier is funded through purchased credits rather than a per-token surcharge.

Q: Do I need an API key if I deploy on Vercel?

Not necessarily. Vercel deployments get an [OIDC token](https://vercel.com/docs/ai-gateway/authentication-and-byok/oidc) as `VERCEL_OIDC_TOKEN` automatically, so there is nothing to rotate. Locally you fall back to an `AI_GATEWAY_API_KEY`.

Q: Can I use my own provider keys?

Yes, through [BYOK](https://vercel.com/docs/ai-gateway/authentication-and-byok/byok) on the paid tier. Your credentials are configured at the team level, and if they fail on a request the gateway can retry with system credentials, billed to your credits.

Most "add a new model" tasks are boring in the worst way. You install another SDK, wire up another API key, learn another auth quirk, and rebuild your fallback logic per provider. Vercel AI Gateway collapses that into one key and one endpoint. You reference a model by a plain string like moonshotai/kimi-k2.5 or anthropic/claude-opus-4.8, and the request routes to the right provider.

We run production chat through it, so this is the applied version: what it is, how it plugs into the AI SDK, what BYOK and OIDC actually buy you, the tradeoffs worth knowing before you couple to it, and who should reach for it.

What the AI Gateway actually is

The AI Gateway is a single HTTP endpoint (https://ai-gateway.vercel.sh/v1) that fronts hundreds of models from many providers. Instead of one client per provider, you get:

One key, many models. Access models from multiple providers with a single API key.
A unified API. Switch providers and models with minimal code changes, using creator/model-name string ids.
Automatic retries. If one provider fails, the gateway can retry the request against another.
Embeddings. The same endpoint generates vector embeddings, not just chat.
Spend monitoring. Usage and cost are tracked across providers in one place.
No token markup. Per the docs, tokens cost the same as buying from the provider directly, including with Bring Your Own Key.

It works with the AI SDK v5 and v6, the OpenAI Chat Completions and Responses APIs, and the Anthropic Messages API, so most existing code paths have a compatible entry point.

The 10-minute version with the AI SDK

The fastest path is the AI SDK, where the gateway is the default provider when you pass a model as a plain string. Set one environment variable:

AI_GATEWAY_API_KEY=your_key_here

Then reference any model by string. No provider import, no per-provider client:

import { generateText } from 'ai';

const { text } = await generateText({
  model: 'anthropic/claude-opus-4.8',
  prompt: 'What is the capital of France?',
});

Swapping models is a one-line change. anthropic/claude-opus-4.8 becomes moonshotai/kimi-k2.5 or openai/gpt-5.5 and nothing else moves. When you want explicit control, the gateway() provider instance gives you the same routing with configurable base URLs and env vars, which matters behind a corporate proxy.

Prefer the OpenAI SDK? Point its base_url at the gateway and keep your existing code:

from openai import OpenAI

client = OpenAI(
  api_key=os.getenv('AI_GATEWAY_API_KEY'),
  base_url='https://ai-gateway.vercel.sh/v1'
)

The gotcha we hit in production: the Responses API default

Here is the applied lesson that is not obvious from the quickstart. As tooling standardizes on OpenAI's newer Responses API, many OpenAI-compatible clients now default to it rather than the older Chat Completions shape. That is fine when you talk to OpenAI. It breaks when you point the same client at an upstream that only serves Chat Completions. Calling Moonshot's endpoint directly, for example, we hit exactly this shape mismatch: the client spoke Responses, the upstream spoke Completions, and requests failed in ways that looked like our bug.

Routing through the gateway is what made this stop being our problem. The gateway normalizes the request and response shapes across providers, so a single client format reaches models that natively expose different APIs. If you have ever burned an afternoon on a "why does this provider return a different envelope" bug, this abstraction is the quiet reason to adopt a gateway even before you care about routing or spend.

Get the weekly deep dive

Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.

From the archive

Webernetes: Kubernetes Ported to the Browser in TypeScript

Jul 1, 2026 • 5 min read

Claude Code Is Steganographically Marking Requests

Jun 30, 2026 • 7 min read

Claude in Microsoft Foundry on Azure: Developer Guide 2026

Jun 30, 2026 • 8 min read

Claude Sonnet 5 Launch Analysis: The Most Agentic Sonnet Yet

Jun 30, 2026 • 6 min read

BYOK: use your own provider keys

Bring Your Own Key lets you attach your own provider credentials at the team level. It is useful when you:

Have existing agreements and want enterprise pricing or provider credits.
Need private access to features that require your own credentials.
Want zero additional fee (BYOK requests carry no markup).

The reliability twist worth knowing: if your own credentials fail on a request, the gateway can retry with system credentials so the call still succeeds, and that fallback usage is billed against your gateway credits. See the BYOK docs for setup.

OIDC: no keys to manage on Vercel

For apps deployed on Vercel, you do not have to manage a gateway API key at all. An OIDC token is automatically available as VERCEL_OIDC_TOKEN, so there is nothing to rotate and no secret to leak. A common pattern reads OIDC in production and falls back to a key locally:

// OIDC on Vercel, API key locally
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;

There is a related operational nicety: standard API keys never expire unless revoked, and when a teammate leaves, Vercel deactivates the keys they created. For automation that should not be tied to a person, OIDC is the cleaner path.

Pricing model

Per the pricing docs, every team gets a free tier and a paid tier:

Free tier: a small monthly credit included, provider list rates with zero markup, and per-model rate limits that are lower than paid. Exceed a limit and you get a 429 to retry after a short wait. BYOK is not available on free.
Paid tier: pay-as-you-go with purchased credits, higher rate limits, BYOK available, and no commitment. Token pricing is still provider list rate with zero markup.

The headline is that the gateway does not mark up tokens. It monetizes through purchased credits and a handful of optional capabilities (things like team-wide provider allowlists or zero-data-retention) that carry small per-request fees only when you enable them. Verify current numbers on the pricing page before you model costs, since credit and capability details change.

The honest tradeoffs

A gateway is not free of cost in the engineering sense. Weigh these before coupling to it:

A network hop. Every request goes through Vercel's infrastructure instead of straight to the provider. For most chat and agent workloads the added latency is small, but latency-critical paths should measure it, not assume it.
Vendor coupling. You are standardizing on Vercel's routing layer and model catalog. The mitigation is real: BYOK and OpenAI/Anthropic-compatible endpoints mean your provider relationships and request shapes stay portable, so leaving is a base-URL change, not a rewrite.
Less low-level control. If you need a bleeding-edge provider parameter the day it ships, a direct SDK call can expose it before a gateway does. Gateways trade a little immediacy for a lot of uniformity.
Another dependency in the path. The gateway adds reliability through cross-provider retries, but it is also one more system that can have an incident. Keep a direct-call fallback for your most critical route.

When any of these dominate, go direct for that specific path and keep the gateway for everything else. It does not have to be all-or-nothing.

Who should use it

Reach for the AI Gateway when you:

Call more than one provider, or expect to, and do not want an SDK-and-key sprawl.
Want one place to watch spend and set budgets across models.
Need provider fallbacks without hand-rolling retry logic per provider.
Deploy on Vercel and would rather use OIDC than manage keys.
Keep hitting request-shape mismatches between OpenAI-compatible clients and upstreams.

Stay direct when you have a single provider you will never leave, a latency budget so tight that a proxy hop is unacceptable, or a need for provider-specific features the moment they ship. For most teams building AI features in 2026, the gateway removes more friction than it adds, and the migration cost is genuinely a few lines.

FAQ

What model id format does the AI Gateway use?

Models are referenced as creator/model-name strings, for example anthropic/claude-opus-4.8, moonshotai/kimi-k2.5, or openai/gpt-5.5. In the AI SDK, passing that string as the model automatically routes through the gateway. See the models and providers docs.

Does the gateway mark up token costs?

No. Per Vercel's docs, tokens are billed at provider list rates with zero markup, including with BYOK. The paid tier is funded through purchased credits rather than a per-token surcharge.

Do I need an API key if I deploy on Vercel?

Not necessarily. Vercel deployments get an OIDC token as VERCEL_OIDC_TOKEN automatically, so there is nothing to rotate. Locally you fall back to an AI_GATEWAY_API_KEY.

Can I use my own provider keys?

Yes, through BYOK on the paid tier. Your credentials are configured at the team level, and if they fail on a request the gateway can retry with system credentials, billed to your credits.

Does it work with the OpenAI or Anthropic SDKs?

Yes. The gateway is compatible with OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages, plus the AI SDK v5 and v6. Point the base URL at https://ai-gateway.vercel.sh/v1 and keep most existing code.

Can I route text-to-speech or audio through it?

Today the gateway focuses on text generation and embeddings. For text-to-speech you generally still call the provider directly. See our TTS API guide for that side of the stack.

What the AI Gateway actually is

The 10-minute version with the AI SDK

The gotcha we hit in production: the Responses API default

Webernetes: Kubernetes Ported to the Browser in TypeScript

Claude Code Is Steganographically Marking Requests

Claude in Microsoft Foundry on Azure: Developer Guide 2026

Claude Sonnet 5 Launch Analysis: The Most Agentic Sonnet Yet

BYOK: use your own provider keys

OIDC: no keys to manage on Vercel

Pricing model

The honest tradeoffs

Who should use it

FAQ

What model id format does the AI Gateway use?

Does the gateway mark up token costs?

Do I need an API key if I deploy on Vercel?

Can I use my own provider keys?

Does it work with the OpenAI or Anthropic SDKs?

Can I route text-to-speech or audio through it?

Sources

Envoy AI Gateway 1.0 Makes LLM Routing an Infrastructure Decision

Free Claude Code Is Really a Model Gateway Bet

Vercel AI SDK: Build Streaming AI Apps in TypeScript

Related Tools

Vercel AI SDK

OpenRouter

v0

Vercel

Related Guides

Custom Model Option - Claude Code

Run AI Models Locally with Ollama and LM Studio

CLAUDE.md Files - Claude Code

Related Videos

Build an AI Device with Groq, Llama 3, OpenAI, TTS, Whisper, Vision, Vercel AI SDK & Next.js

Build a Next.JS Answer Engine with Vercel AI SDK, Groq, Mistral, Langchain, OpenAI, Brave & Serper

Groq API: Quick Guide with 5 Examples - Groq SDK, Langchain, LlamaIndex, OpenAI SDK, Vercel

Related Posts

Envoy AI Gateway 1.0 Makes LLM Routing an Infrastructure Decision

Free Claude Code Is Really a Model Gateway Bet

Vercel AI SDK: Build Streaming AI Apps in TypeScript

Model Routing Recipes: Practical Config Patterns to Cut AI Spend

AI Model Routing: Why the Orchestration Layer Is the Next Big Play Next to the Labs

Everything Vercel Shipped at Ship 26 (June 2026)

Get Smarter About AI Dev

What the AI Gateway actually is

The 10-minute version with the AI SDK

The gotcha we hit in production: the Responses API default

Webernetes: Kubernetes Ported to the Browser in TypeScript

Claude Code Is Steganographically Marking Requests

Claude in Microsoft Foundry on Azure: Developer Guide 2026

Claude Sonnet 5 Launch Analysis: The Most Agentic Sonnet Yet

BYOK: use your own provider keys

OIDC: no keys to manage on Vercel

Pricing model

The honest tradeoffs

Who should use it

FAQ

What model id format does the AI Gateway use?

Does the gateway mark up token costs?

Do I need an API key if I deploy on Vercel?

Can I use my own provider keys?

Does it work with the OpenAI or Anthropic SDKs?

Can I route text-to-speech or audio through it?

Sources

Envoy AI Gateway 1.0 Makes LLM Routing an Infrastructure Decision

Free Claude Code Is Really a Model Gateway Bet

Vercel AI SDK: Build Streaming AI Apps in TypeScript

Related Tools

Vercel AI SDK

OpenRouter

v0

Vercel

Related Guides

Custom Model Option - Claude Code

Run AI Models Locally with Ollama and LM Studio

CLAUDE.md Files - Claude Code

Related Videos

Build an AI Device with Groq, Llama 3, OpenAI, TTS, Whisper, Vision, Vercel AI SDK & Next.js

Build a Next.JS Answer Engine with Vercel AI SDK, Groq, Mistral, Langchain, OpenAI, Brave & Serper

Groq API: Quick Guide with 5 Examples - Groq SDK, Langchain, LlamaIndex, OpenAI SDK, Vercel