Inference
A single HTTP endpoint from Vercel that fronts hundreds of models from many providers behind one API key.
A single HTTP endpoint from Vercel that fronts hundreds of models from many providers behind one API key. You reference a model by a plain string like anthropic/claude-opus-4.8 or moonshotai/kimi-k2.5 and the request routes to the right provider, with automatic retries, embeddings, and spend monitoring across providers in one place. It is the default provider in the Vercel AI SDK when you pass a model as a string, and per its docs it adds no token markup, including with Bring Your Own Key.
Example
You reference a model by a plain string like anthropic/claude-opus-4.8 or moonshotai/kimi-k2.5 and the request routes to the right provider, with automatic retries, embeddings, and spend monitoring across providers in one place.
Hands-on guides, comparisons, and tutorials that cover Inference.
FAQ
A single HTTP endpoint from Vercel that fronts hundreds of models from many providers behind one API key.
Vercel AI Gateway sits in the Inference part of the AI stack. Understanding it helps you make better decisions when building, debugging, and shipping AI features.
Developers Digest publishes tutorials and videos that cover Inference topics including Vercel AI Gateway. Check the blog and YouTube channel for hands-on walkthroughs.
Related
A model architecture that routes each input to a small subset of specialized sub-networks ("experts") rather than activating the entire model.
The core technique inside transformers that lets a model weigh the relevance of every token relative to every other token in a sequence.
The discipline of designing what information goes into a model's context window and how it is structured.

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.