
TL;DR
A practical comparison of LLM routing tools - LiteLLM, Portkey, and OpenRouter - covering cost management, fallbacks, caching, and when to use each for production AI applications.
Direct answer
A practical comparison of LLM routing tools - LiteLLM, Portkey, and OpenRouter - covering cost management, fallbacks, caching, and when to use each for production AI applications.
Best for
Developers comparing real tool tradeoffs before choosing a stack.
Covers
Verdict, tradeoffs, pricing signals, workflow fit, and related alternatives.
Read next
The models.dev project is trending because AI teams need one boring source of truth for model specs, pricing, context windows, modalities, and tool support.
7 min readA Q2 2026 pricing and packaging update for AI coding tools, based on official plan docs and release notes. Includes practical cost traps and selection frameworks for teams.
12 min readCut Claude API spend by up to 90% with prompt caching. Real numbers, TypeScript SDK code, and the gotchas Anthropic's docs gloss over.
11 min read| Source | What it covers |
|---|---|
| LiteLLM Documentation | Unified API for 100+ LLMs, proxy server setup, routing strategies, fallbacks, and cost tracking |
| LiteLLM GitHub | Open-source codebase with 22k+ stars, provider implementations, and configuration examples |
| Portkey AI Gateway | Enterprise gateway with guardrails, semantic caching, and observability features |
| Portkey Gateway GitHub | Open-source gateway supporting 1600+ models with fallbacks and load balancing |
| OpenRouter Documentation | Unified API for hundreds of models with automatic fallbacks and cost-based routing |
| OpenRouter Pricing | Model-specific pricing with provider comparison and availability status |
LLM costs add up. A single agent workflow hitting Claude Opus can burn through API credits faster than most teams expect. The standard response is model routing - picking the right model for each task, falling back when providers fail, and tracking spend across projects.
Three tools dominate this space: LiteLLM, Portkey, and OpenRouter. They solve similar problems differently. Here is when to use each.
Every production AI application eventually needs:
You can build this yourself. Most teams eventually wish they had not.
LiteLLM is a Python SDK and proxy server that gives you an OpenAI-compatible interface to 100+ LLM providers. You deploy it yourself - either as a library in your code or as a standalone proxy server.
Configure models in YAML:
model_list:
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY
tpm: 40000
rpm: 500
- model_name: gpt-4 # Same name = fallback
litellm_params:
model: azure/gpt-4
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
tpm: 80000
rpm: 800
router_settings:
routing_strategy: usage-based-routing
fallbacks: [{"gpt-4": ["azure/gpt-4"]}]
litellm_settings:
num_retries: 3
request_timeout: 10
context_window_fallbacks: [{"gpt-4": ["gpt-3.5-turbo-16k"]}]
Then call it like OpenAI:
from openai import OpenAI
client = OpenAI(
api_key="your-litellm-key",
base_url="http://localhost:4000"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
LiteLLM supports multiple routing approaches:
The proxy logs spend per request and aggregates by model, user, or team. You get visibility into which workflows are expensive before the bill arrives.
LiteLLM fits teams that:
The tradeoff is operational overhead. You deploy it, monitor it, and scale it yourself.
Portkey is a managed AI gateway focused on enterprise features - guardrails, semantic caching, and observability. It started as a hosted service and later open-sourced the gateway component.
Portkey uses a configuration-based approach with nested strategies:
import { Portkey } from 'portkey-ai';
const config = {
strategy: {
mode: 'loadbalance'
},
targets: [
{
virtual_key: process.env.ANTHROPIC_VIRTUAL_KEY,
weight: 0.5,
override_params: {
model: 'claude-3-opus-20240229'
}
},
{
strategy: {
mode: 'fallback'
},
targets: [
{ virtual_key: process.env.OPENAI_VIRTUAL_KEY },
{ virtual_key: process.env.AZURE_VIRTUAL_KEY }
],
weight: 0.5
}
]
};
const portkey = new Portkey({
apiKey: process.env.PORTKEY_API_KEY,
config
});
const response = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Hello' }],
model: 'gpt-4'
});
Portkey can cache semantically similar prompts, not just exact matches. This saves money when users ask slight variations of the same question:
{
"cache": { "mode": "semantic" },
"strategy": { "mode": "fallback" },
"targets": [
{ "provider": "openai", "api_key": "..." },
{ "provider": "anthropic", "api_key": "..." }
]
}
The enterprise focus shows in features like:
Portkey fits teams that:
The tradeoff is cost. Hosted Portkey is not free. The open-source gateway provides fallbacks and routing but not all enterprise features.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
OpenRouter is a hosted service that provides access to hundreds of models through a single API endpoint. You do not deploy anything - you use their API like you would use OpenAI's, but with access to models from every provider.
OpenRouter acts as a marketplace and router. You make API calls to OpenRouter, and they route to the underlying provider:
from openai import OpenAI
client = OpenAI(
api_key="your-openrouter-key",
base_url="https://openrouter.ai/api/v1"
)
response = client.chat.completions.create(
model="anthropic/claude-3-opus", # Use any provider
messages=[{"role": "user", "content": "Hello"}]
)
OpenRouter supports automatic fallbacks:
from livekit.plugins import openai
llm = openai.LLM.with_openrouter(
model="openai/gpt-4o",
fallback_models=[
"anthropic/claude-sonnet-4",
"openai/gpt-5-mini",
],
)
Or in their SDK:
const result = openrouter.callModel({
models: ['anthropic/claude-sonnet-4.5', 'openai/gpt-5.2', 'google/gemini-pro'],
input: 'Hello!',
});
OpenRouter's default routing strategy prioritizes lowest-cost providers:
You pay OpenRouter's price per model, which includes their margin. For some models, they add to the base price. For others, they offer competitive rates because of volume agreements.
OpenRouter fits teams that:
The tradeoff is less control. You cannot customize routing logic, caching behavior, or fallback strategies beyond what OpenRouter provides. You also depend on their uptime.
| Feature | LiteLLM | Portkey | OpenRouter |
|---|---|---|---|
| Deployment | Self-hosted | Hosted or self-hosted | Hosted only |
| Open source | Yes (MIT) | Gateway yes, hosted no | No |
| Provider count | 100+ | 1600+ | Hundreds |
| Fallbacks | Yes, configurable | Yes, nested strategies | Yes, model list |
| Load balancing | Yes, multiple strategies | Yes, weighted | Yes, price-based |
| Semantic caching | No (simple caching only) | Yes | No |
| Cost tracking | Yes, per-request | Yes, with observability | Yes, dashboard |
| Guardrails | No | Yes | No |
| Custom routing | Full control | Config-based | Limited |
| Pricing | Free (self-hosted) | Free tier + paid | Per-token markup |
Some teams combine these tools:
The hidden cost in LLM routing is not the router - it is poor model selection.
A router that sends every request to Claude Opus when Haiku would suffice costs 30x more. A router without fallbacks loses availability when a single provider has issues. A router without cost tracking surprises you at month end.
The right tool is the one that makes model selection explicit and cost tracking automatic. Whether that is a self-hosted proxy, a managed gateway, or a unified API depends on your team's operational capacity and compliance requirements.
Model routing is infrastructure now. As AI applications mature, the question shifts from "which model do I use?" to "how do I use the right model for each request, with fallbacks, at predictable cost?"
LiteLLM gives you full control and zero marginal cost, but requires operational investment.
Portkey provides enterprise-grade features and observability, but requires budget.
OpenRouter offers maximum simplicity and model access, but abstracts away control.
Pick based on your constraints. The wrong answer is not picking one at all.
An LLM router is a proxy layer that sits between your application and LLM providers. It handles multi-provider access (using Claude, GPT, and open-source models through one interface), automatic fallbacks when providers fail, cost tracking across projects, and load balancing across rate limits. You need one when your AI application grows beyond a single model - which happens faster than most teams expect.
LiteLLM wraps 100+ LLM providers in an OpenAI-compatible interface. Instead of managing separate SDKs and API keys for each provider, you configure models in YAML and call them through one endpoint. LiteLLM also adds fallbacks, retries, cost tracking, and routing strategies that the native APIs do not provide. You deploy it yourself, which means operational overhead but no per-request fees.
Portkey's value depends on whether you need its enterprise features. Semantic caching can reduce costs significantly for applications with similar queries. Guardrails (PII detection, content filtering, token budgets) matter for regulated industries. Observability surfaces are more polished than what you would build on LiteLLM. If you do not need these features, LiteLLM or OpenRouter may be more cost-effective.
OpenRouter adds a margin to most model prices, but the markup varies. For some models, they offer competitive rates due to volume agreements. For others, you pay 10-20% more than direct access. The tradeoff is simplicity - one API key, one billing relationship, access to hundreds of models. Calculate whether that convenience is worth the margin for your usage volume.
Yes. Common patterns include using OpenRouter for exploration and model testing, then moving production traffic to self-hosted LiteLLM for cost control. Some teams route enterprise traffic through Portkey for compliance while keeping internal tools on direct APIs. The models.dev registry can populate any router's configuration with current model specs.
Fallback behavior varies by router. LiteLLM lets you define fallback chains per model name and configure retries, timeouts, and context-window-specific fallbacks. Portkey supports nested fallback strategies with weighted load balancing. OpenRouter accepts an array of models and tries each in order. The key is testing fallbacks before you need them - a misconfigured fallback that fails silently costs more than no fallback at all.
Self-hosted routers (LiteLLM) add minimal latency - typically single-digit milliseconds. Hosted services (Portkey, OpenRouter) add network round-trip time to their servers, usually 20-50ms depending on your location. For most LLM applications, this is negligible compared to model inference time. For latency-critical paths, self-hosting or direct provider access is preferable.
Building a basic proxy is straightforward. Building one with proper fallbacks, retries, cost tracking, rate limit handling, caching, and observability takes months. Most teams underestimate the maintenance burden - every provider API change, new model release, or pricing update requires attention. Use an existing router unless you have specific requirements that none of them satisfy.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Beat the August 2026 Assistants API sunset. Paste old code, get Responses API.
View AppWatch your LLM spend tick up live, right in the editor.
View AppQueue and organize repeatable agent workflows before they become production automations.
View App
The models.dev project is trending because AI teams need one boring source of truth for model specs, pricing, context wi...

A Q2 2026 pricing and packaging update for AI coding tools, based on official plan docs and release notes. Includes prac...

Cut Claude API spend by up to 90% with prompt caching. Real numbers, TypeScript SDK code, and the gotchas Anthropic's do...

The defensive patterns that keep Claude integrations alive in production. Retry shapes, backoff with jitter, circuit bre...

The AI SDK is the fastest way to add streaming AI responses to your Next.js app. Here is how to use it with Claude, GPT,...

AI coding agents have crossed from demo to daily workflow. The next bottleneck is not demand. It is cost attribution, bu...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.