
TL;DR
Claude Code does not have to call Anthropic's API. Here are five working patterns for running it through your own gateway, on your own models, in your own VPC, with full audit logs and cost control.
Read next
A practical operational guide to Claude Code usage limits in 2026: plan behavior, API key pitfalls, routing choices, and team controls using hooks and subagents.
9 min readHow to use Claude Code's Task tool, custom sub-agents, and worktrees to run parallel development workflows. Real prompt examples, agent configurations, and workflow patterns from daily use.
11 min read12 AI coding tools across 4 architecture types, compared on pricing, strengths, weaknesses, and best use cases. The definitive comparison matrix for 2026.
15 min readThe default story for Claude Code is simple: install the CLI, log in with an Anthropic account, and your prompts go straight to api.anthropic.com. That works for individual developers. It does not work for regulated teams, enterprises with strict data residency rules, or anyone who wants to mix Claude with cheaper open-source models without paying retail rates on every token.
Good news: Claude Code is much more open than people realize. The CLI talks to whatever endpoint you point it at, as long as the wire protocol matches Anthropic's Messages API. That single fact unlocks a surprising amount of architectural flexibility. This post walks through five concrete patterns for self-hosting Claude Code on your own infrastructure, from a five-minute LiteLLM proxy on a laptop to a full enterprise gateway with audit logs and SSO.
If you have been running Claude Code at scale, you have probably already hit the usage limits playbook wall. These patterns are the next step.
Claude Code reads two environment variables that change the entire request path:
# Override the API endpoint
export ANTHROPIC_BASE_URL="https://your-gateway.example.com"
# Override the auth token
export ANTHROPIC_AUTH_TOKEN="your-internal-token"
If your gateway speaks the Anthropic Messages API on the wire, Claude Code will not know the difference. This is the foundation of every pattern below.
There is also ANTHROPIC_MODEL for forcing a specific model name and a set of network variables (HTTPS_PROXY, NODE_EXTRA_CA_CERTS) for corporate proxies and custom certificate authorities. The Anthropic documentation calls this enterprise network configuration but it works anywhere.
The simplest pattern. You run LiteLLM as a local proxy on port 4000, point Claude Code at it, and route requests to whatever provider you want behind the scenes. It takes about five minutes to set up.
# litellm-config.yaml
model_list:
- model_name: claude-sonnet-4-7
litellm_params:
model: anthropic/claude-sonnet-4-7
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-haiku-4-7
litellm_params:
model: bedrock/anthropic.claude-haiku-4-7
aws_region_name: us-east-1
- model_name: gpt-5-3
litellm_params:
model: openai/gpt-5.3
api_key: os.environ/OPENAI_API_KEY
router_settings:
routing_strategy: simple-shuffle
general_settings:
master_key: sk-internal-team-key
Run it:
litellm --config litellm-config.yaml --port 4000
Point Claude Code at it:
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="sk-internal-team-key"
claude
You now have a proxy that logs every request, enforces budget limits per virtual key, and can fall back across providers when one is rate-limited. Same Claude Code experience, full visibility into what your team is sending.
This pattern is great for individual developers and small teams. It does not give you SSO or audit logs that auditors will accept, but it solves the cost-tracking problem for under an hour of setup.
If you cannot send code to Anthropic directly because of compliance, you have two options that already speak Claude: AWS Bedrock and Google Vertex AI. Both host the same Claude models and route everything through your existing cloud account.
For Bedrock:
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION="us-east-1"
export ANTHROPIC_MODEL="us.anthropic.claude-sonnet-4-7-v1:0"
export ANTHROPIC_SMALL_FAST_MODEL="us.anthropic.claude-haiku-4-7-v1:0"
claude
For Vertex:
export CLAUDE_CODE_USE_VERTEX=1
export CLOUD_ML_REGION="us-east5"
export ANTHROPIC_VERTEX_PROJECT_ID="your-gcp-project"
export ANTHROPIC_MODEL="claude-sonnet-4-7@20260301"
claude
Claude Code knows about these flags natively. Authentication uses your existing AWS or GCP credentials, all logs flow into CloudTrail or Cloud Audit Logs, and the data never leaves your cloud account boundary. For most enterprise compliance requirements this is the cleanest answer.
The tradeoff: Bedrock and Vertex sometimes lag behind direct Anthropic on new model releases by a few weeks, and prompt caching support has historically been spottier. Test before committing.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
From the archive
Apr 29, 2026 • 12 min read
Apr 29, 2026 • 12 min read
Apr 29, 2026 • 12 min read
Apr 29, 2026 • 10 min read
For organizations that need centralized identity, audit logs, and per-developer attribution, the right pattern is a self-hosted gateway behind Identity-Aware Proxy. The high-level architecture:
[Developer machine]
-> Local proxy (Claude Code calls this)
-> [Identity-Aware Proxy] (Google Workspace SSO)
-> [FastAPI gateway on Cloud Run]
-> Anthropic API or Bedrock
The local proxy is a tiny piece of software running on the developer's laptop that intercepts Claude Code's API calls, fetches a fresh OIDC token from gcloud, and forwards the request to the company gateway with Authorization: Bearer <id-token>. IAP validates the token, confirms the user is in the right Google Workspace group, and forwards to your FastAPI service. Your service logs the request, attaches the user identity, and proxies to Anthropic.
The skeleton of the gateway service:
# gateway.py
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse
import httpx
import os
app = FastAPI()
ANTHROPIC_KEY = os.environ["ANTHROPIC_API_KEY"]
@app.post("/v1/messages")
async def messages(request: Request):
user = request.headers.get("X-Goog-Authenticated-User-Email")
if not user:
raise HTTPException(401, "missing identity")
body = await request.body()
# Log who, when, what model, token estimate
log_request(user=user, body=body)
# Forward to Anthropic, streaming back to the client
headers = {
"x-api-key": ANTHROPIC_KEY,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
}
async def upstream():
async with httpx.AsyncClient(timeout=None) as client:
async with client.stream(
"POST",
"https://api.anthropic.com/v1/messages",
content=body,
headers=headers,
) as r:
async for chunk in r.aiter_raw():
yield chunk
return StreamingResponse(upstream(), media_type="text/event-stream")
Every developer sets ANTHROPIC_BASE_URL to the gateway and authenticates via SSO. You get a single audit log of every prompt anyone in the company sent, attributable to a specific identity. When someone leaves the company, removing them from the Workspace group revokes their access immediately. No scattered API keys to rotate.
This is the pattern that makes Claude Code viable in regulated industries. Build it once, every developer benefits.
You do not have to use Anthropic models with Claude Code. The open-source Claude Code Router project translates between Claude's wire format and any other provider, including local Ollama models, OpenRouter, Groq, DeepSeek, and Together.
Install and configure:
npm install -g @musistudio/claude-code-router
# ~/.claude-code-router/config.json
{
"Providers": [
{
"name": "ollama",
"api_base_url": "http://localhost:11434/v1/chat/completions",
"models": ["qwen3.5-coder:35b", "deepseek-coder:33b"]
},
{
"name": "openrouter",
"api_base_url": "https://openrouter.ai/api/v1/chat/completions",
"api_key": "$OPENROUTER_API_KEY",
"models": ["anthropic/claude-sonnet-4-7", "google/gemini-2.5-pro"]
}
],
"Router": {
"default": "ollama,qwen3.5-coder:35b",
"background": "ollama,qwen3.5-coder:35b",
"think": "openrouter,anthropic/claude-sonnet-4-7",
"longContext": "openrouter,anthropic/claude-sonnet-4-7"
}
}Run Claude Code through the router:
ccr code
The router routes "thinking" tasks to Claude Sonnet on OpenRouter and routine tasks to a local Qwen model on Ollama. You pay nothing for the bulk of your tokens, get frontier-quality reasoning when you need it, and your code never leaves your laptop for the local-only routes.
This is the budget-conscious pattern. We documented the full setup in our comparison of every AI coding tool's economics, and it pairs well with cheap GPU rentals if your laptop is not powerful enough to run a 35B model locally.
The most extreme version. You run an open-weight coding model on your own GPUs, expose an Anthropic-compatible endpoint, and Claude Code never touches the public internet. This is what defense, healthcare, and certain financial customers require.
Stack:
Minimal docker compose:
services:
vllm:
image: vllm/vllm-openai:latest
command:
- --model=Qwen/Qwen3.5-Coder-32B-Instruct
- --max-model-len=131072
- --tensor-parallel-size=2
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2
capabilities: [gpu]
ports:
- "8000:8000"
litellm:
image: ghcr.io/berriai/litellm:main-latest
environment:
LITELLM_MASTER_KEY: sk-internal
volumes:
- ./litellm-config.yaml:/app/config.yaml
command: ["--config", "/app/config.yaml", "--port", "4000"]
ports:
- "4000:4000"
Developers connect like this:
export ANTHROPIC_BASE_URL="https://internal-claude.corp.example.com"
export ANTHROPIC_AUTH_TOKEN="$INTERNAL_TOKEN"
export ANTHROPIC_MODEL="qwen3.5-coder-32b"
claude
You give up some quality. Qwen3.5 and DeepSeek are excellent but not Sonnet 4.7. For most refactors, test writing, and routine feature work they are good enough. For the hard 10 percent of problems, route to the gateway pattern above when policy allows.
This pattern also pairs well with building multi-agent workflows in Claude Code, because cheap local inference makes fan-out architectures economical that would be cost-prohibitive against the public API.
For a walkthrough of the LiteLLM and Claude Code Router patterns running side by side on a single laptop, with cost dashboards and live token streaming:
Subscribe to Developers Digest for the rest of the self-hosting series.
A simple decision matrix:
| Need | Pattern |
|---|---|
| Just want cost tracking and team budgets | LiteLLM proxy (Pattern 1) |
| Compliance, no Anthropic API direct, AWS or GCP shop | Bedrock or Vertex (Pattern 2) |
| Centralized identity, audit logs, SSO for the whole org | Enterprise gateway with IAP (Pattern 3) |
| Want to slash costs by routing easy tasks to local models | Claude Code Router (Pattern 4) |
| Air-gapped, cannot send code anywhere external | Self-hosted GPUs with vLLM (Pattern 5) |
Most teams should start with Pattern 1. It is reversible, ships in an afternoon, and tells you whether your usage justifies the more invasive patterns. The teams that need Pattern 5 already know they need it; the rest are doing premature optimization.
The reason these patterns exist is that Anthropic made a deliberate decision to keep Claude Code's wire protocol portable. The CLI is opinionated about how it works on your machine - the sub-agent system, the hooks, the worktree integration - but completely agnostic about which backend serves the model. That separation is rare among AI coding tools.
It also means the cost ceiling on Claude Code is a lot lower than it appears. The retail price assumes everything goes to the public API. With the patterns above, real-world team costs come down by 40 to 90 percent depending on how aggressive you are about routing, with no change to the developer experience.
If you are evaluating AI coding tools for an organization, Claude Code's self-hosting story is not a sidebar. It is one of the strongest arguments for picking it over the alternatives. Pair it with our full 2026 comparison matrix when you make the case to your platform team.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolInteractive TUI dashboard that shows exactly where your Claude Code and Cursor tokens are going, in real time.
View ToolAnthropic's flagship reasoning model. Best-in-class for coding, long-context analysis, and agentic workflows. 1M token c...
View ToolAI-native code editor forked from VS Code. Composer mode rewrites multiple files at once. Tab autocomplete predicts your...
View Tool20+ Claude Code hooks for automation, safety, and dx. Generate a ready-to-install settings.json for your project.
Open AppGitHub Action that lints SKILL.md files so your Claude Code skills stay valid in CI.
Open AppOne control panel for Claude Code, Codex, Gemini, Cursor, and 10+ AI coding harnesses. Desktop app for Mac.
Open AppInstall Claude Code, configure your first project, and start shipping code with AI in under 5 minutes.
Getting StartedA practical walk-through of how to design, write, and ship a Claude Code skill - from choosing when to trigger, through allowed-tools, to the steps the agent will actually follow.
Getting StartedA concrete step-by-step guide to moving your development workflow from Cursor to Claude Code - settings, rules, keybindings, and the habits that transfer.
Getting Started
Nimbalyst Demo: A Visual Workspace for Codex + Claude Code with Kanban, Plans, and AI Commits Try it: https://nimbalyst.com/ Star Repo Here: https://github.com/Nimbalyst/nimbalyst This video demos N...

Composio: Connect AI Agents to 1,000+ Apps via CLI (Gmail, Google Docs/Sheets, Hacker News Workflows) Check out Composio here: http://dashboard.composio.dev/?utm_source=Youtube&utm_channel=0426&utm_...

Anthropic has released Channels for Claude Code, enabling external events (CI alerts, production errors, PR comments, Discord/Telegram messages, webhooks, cron jobs, logs, and monitoring signals) to b...

A practical operational guide to Claude Code usage limits in 2026: plan behavior, API key pitfalls, routing choices, and...

How to use Claude Code's Task tool, custom sub-agents, and worktrees to run parallel development workflows. Real prompt...

12 AI coding tools across 4 architecture types, compared on pricing, strengths, weaknesses, and best use cases. The defi...

One dev, one CLI, 24 subdomains, and a lot of parallel agents. The playbook for shipping an AI app portfolio.

The andrej-karpathy-skills repo exploded because every coding agent needs behavioral rails. The useful move is not copyi...

Efficient agents do not stuff every tool result into the model context. They keep intermediate state in code, files, and...

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.