Skip to content
Go back

OpenRouter vs LiteLLM

By SumGuy 12 min read
OpenRouter vs LiteLLM

The Multi-Model Problem

Here’s the thing: you’ve got Claude running on Anthropic’s API. GPT-4 on OpenAI. Llama 3 on Groq. Maybe some local open source stuff. And your app needs to talk to all of them without embedding each provider’s SDK directly in your code. Because vendor lock-in is for people who enjoy pain.

What if Claude goes down? What if OpenAI’s rates spike? What if you want to A/B test which model actually gives better results for your use case? These are the kinds of 2 AM questions that keep infrastructure people awake.

This is where request routing comes in. Instead of hardcoding OpenAI’s endpoint in your app, you point your code at a gateway — a single API that knows how to talk to every provider you care about. When one provider hiccups, the gateway automatically tries the next one. When pricing changes, you adjust the routing rules once, not in six different services.

The two heavyweight contenders are OpenRouter and LiteLLM. Same problem, wildly different philosophies. Let’s dig in.


OpenRouter: Hosted Gateway, Hands Off

OpenRouter is the turnkey option. You sign up, get an API key, and suddenly your code can talk to ~100 different models through a single endpoint. No infrastructure to run. No Docker compose files. No monitoring dashboards you built yourself at midnight.

How it works: OpenRouter sits between your app and every major LLM provider. You make one HTTP request to OpenRouter’s API. It routes your request to whichever provider you specified, or it auto-selects based on your criteria (cheapest, fastest, highest quality). Then it streams the response back to you.

It’s the cloud SaaS play. Hosted by OpenRouter’s team. You pay their margin on top of the provider’s actual costs. For many teams, that’s totally fine. For your 2 AM self, it might be worth the peace of mind.

The OpenRouter Upside

Setup is trivial. Sign up, drop an API key in your .env, update your client library to point to https://openrouter.io/api/v1 instead of https://api.openai.com/v1. Done. If your app already uses the OpenAI Python SDK, you literally change three lines:

from openai import OpenAI
client = OpenAI(
api_key=os.getenv("OPENROUTER_API_KEY"),
base_url="https://openrouter.io/api/v1",
)
response = client.chat.completions.create(
model="openrouter/auto", # OpenRouter picks the best match
messages=[{"role": "user", "content": "Explain Docker to me"}],
)

Automatic provider fallback. If you specify openrouter/auto, OpenRouter uses their own heuristics to pick a provider. Model overloaded? They try the next cheapest alternative. Provider outage? They route around it. You don’t have to think about it.

Model marketplace. OpenRouter surfaces models from 30+ providers in one catalog. Anthropic, OpenAI, Meta, Cohere, Mistral, Groq, Together, even obscure open source models. No account juggling. One bill. One dashboard showing spend across all of it.

No infrastructure. You’re not responsible for keeping a proxy alive. No Docker image to patch. No Python dependencies to upgrade. OpenRouter’s team handles the plumbing. Your ops load is zero.

Team-friendly. Slap an OpenRouter key on a shared .env and everyone’s good. No individual provider credentials scattered across five different services. That alone is worth something when Sarah leaves the team and you realize her API keys are still live everywhere.

The OpenRouter Downside

You pay the margin. OpenRouter charges a percentage on top of the provider’s actual rate. Want GPT-4o? OpenAI charges $0.015 per 1K input tokens. OpenRouter charges ~$0.020. That’s a 33% markup. Over a year of heavy API use, that adds up. For a solo home lab tinkering project, it’s pocket change. For a production SaaS startup, it’s real money.

Routing rules are limited. You can specify a model or use openrouter/auto, but you can’t write fancy conditional logic like “if response latency > 5 seconds, failover to this provider” or “if we’ve hit 80% of our budget, switch to cheaper models.” It’s their rules, not yours.

Less observability. You get a dashboard showing what you spent on which models, but you don’t get deep visibility into why a request went to Provider A instead of Provider B. If you need audit logs and detailed routing decisions for compliance, you’re limited.

Vendor lock-in (ironic). You’re trying to avoid vendor lock-in by using a multi-model gateway. But you’ve introduced a new vendor: OpenRouter. If they go down (rare, but possible), your app breaks. If they change pricing or deprecate a model, you have to adapt. It’s distributed risk, but it’s still risk.


LiteLLM: Self-Hosted Proxy, Full Control

LiteLLM is the DIY option. You run the proxy yourself — in Docker, on your home lab server, wherever. You own the routing decisions. You own the fallback chains. You own the observability. You also own the operational headaches.

How it works: LiteLLM is a lightweight Python proxy that translates requests from your app into calls to any LLM provider. It has a YAML config file where you define which providers to use, how to route between them, fallback rules, budget limits, caching behavior, cost tracking. Your app makes one request to your local LiteLLM instance. LiteLLM figures out where to send it.

It’s the self-hosted play. You control everything. You pay zero margin because you’re directly hitting provider APIs with your own credentials.

The LiteLLM Upside

Zero margin. You have accounts with Anthropic, OpenAI, Groq, whatever. LiteLLM uses your credentials directly. You pay exactly what each provider charges, no markup. Over time, that math is compelling.

Complete routing control. Define fallback chains: “Try Groq first because it’s stupid cheap. If that fails, try Together. If that fails, hit Claude with an immediate=true flag because you need quality.” You write YAML rules that match your exact use case.

model_list:
- model_name: "gpt-4"
litellm_params:
model: "gpt-4-turbo"
api_key: $OPENAI_API_KEY
- model_name: "gpt-4"
litellm_params:
model: "claude-3-opus-20250219"
api_key: $ANTHROPIC_API_KEY
fallback_list:
- gpt-4:
- gpt-4-turbo # Try OpenAI first
- claude-3-opus-20250219 # If that fails, use Claude

Budget controls. Set spend limits per model, per user, per API key. Hit your budget? LiteLLM blocks requests and you find out before your credit card declines. Audit trail is there. No surprises.

Caching. LiteLLM can cache prompt+completion pairs, so repeated identical requests don’t hit the provider again. Huge for dev workflows. Anthropic’s prompt caching is supported natively.

Observability. Every request gets logged. Latency, cost, provider, success/failure, why it failed. You can hook into Datadog, New Relic, Langfuse, or ingest logs into your own Elasticsearch stack. Full audit trail, full control.

Hosted option exists too. If you want the same control but don’t want to run it yourself, LiteLLM has a hosted proxy called LiteLLM Proxy Pro. Same config, they run the infrastructure. You get the best of both worlds — control + no ops. (But you do pay for it.)

The LiteLLM Downside

You run it. It’s a service you have to keep alive. Deploy it to Docker, write a systemd unit, stick it on your Proxmox homelab VM — whatever. But it’s your responsibility. It needs monitoring. It needs logs. If it crashes at 2 AM, your app stops working.

Setup is harder. You need provider API keys for every service you want to route to. You need to write YAML config. You need to think about how to run the proxy (Docker? Kubernetes? systemd?). There’s no one-click signup.

Debugging fallback chains is messy. When a request fails and bounces through three fallback providers before succeeding, you need to dig through logs to understand why. OpenRouter abstracts that away; LiteLLM gives you full visibility but also full responsibility.

Operator burden. Managing secrets is your problem. Scaling the proxy is your problem. Keeping dependencies patched is your problem. For a solo home lab, that’s fine. For a team, you’re asking someone to own this as a system.


Head-to-Head Comparison

DimensionOpenRouterLiteLLM
Setup time5 minutes30 minutes (Docker + config)
Cost modelPay margin (20-40% extra)Zero margin (direct API costs)
InfrastructureHosted by OpenRouterYou run it
Routing flexibilityLimited (auto / manual select)Unlimited (YAML rules)
Fallback chainsBasic (one provider at a time)Full control (custom rules)
Budget controlsBasic dashboardGranular per-model/user limits
ObservabilityDashboard + basic logsFull audit trail, integratable
Provider count~100 models from 30+ providersAll (you control the keys)
Compliance/auditLimitedFull control
Dependency riskYou rely on OpenRouterYou rely on your infrastructure
Best forTeams, SaaS, quick prototypesHome labs, full control, cost-sensitive

The Real Cost Question

Let’s do the math. Assume you’re using Claude Opus and it costs $15 per 1M input tokens.

OpenRouter: $15 × 1.33 (33% markup) = $19.95 per 1M tokens

LiteLLM: $15 × 1.0 (direct API) = $15 per 1M tokens

Difference: $4.95 per 1M tokens.

If you burn 10M tokens a month (heavy usage), that’s $49.50/month extra you’re paying OpenRouter. Over a year, that’s $594. For a 2-person startup, that matters. For a solo home lab player, it’s probably not the deciding factor — but the saved ops burden of not running a proxy might be worth more.


Docker Compose Setup (LiteLLM)

If you want to run LiteLLM locally, here’s a working compose file:

docker-compose.yml
version: '3.8'
services:
litellm:
image: ghcr.io/berriai/litellm:latest
container_name: litellm-proxy
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GROQ_API_KEY=${GROQ_API_KEY}
volumes:
- ./config.yaml:/app/config.yaml
command: litellm --config /app/config.yaml --port 8000
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 5s
retries: 3

LiteLLM Config (YAML)

Drop this in your config.yaml file alongside the compose:

config.yaml
model_list:
- model_name: "gpt-4"
litellm_params:
model: "gpt-4-turbo"
api_key: ${OPENAI_API_KEY}
- model_name: "gpt-4-fast"
litellm_params:
model: "groq/mixtral-8x7b-32768"
api_key: ${GROQ_API_KEY}
- model_name: "claude-3-opus"
litellm_params:
model: "claude-3-opus-20250219"
api_key: ${ANTHROPIC_API_KEY}
router_settings:
redis_host: "" # Optional: set to your Redis instance for shared state
timeout: 30
router_data:
- model_name: "gpt-4"
deployment_id: "gpt-4-turbo"
rpm_limit: 3500 # OpenAI rate limit
fallback_routes:
- model_name: "gpt-4"
fallbacks:
- "gpt-4-fast" # Try Groq (cheapest)
- "claude-3-opus" # Then Claude (pricier but reliable)
general_settings:
completion_max_tokens: 4096
context_window_fallback_ratio: 0.75
master_key: ${LITELLM_MASTER_KEY} # Set via .env

Set your .env file:

Terminal window
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk-...
LITELLM_MASTER_KEY=sk-1234567890abcdef

Then spin it up:

Terminal window
docker-compose up -d

Calling OpenRouter (Direct)

If you go the OpenRouter route, here’s a curl example hitting their API directly:

Terminal window
curl -X POST https://openrouter.io/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "anthropic/claude-3-opus-20250219",
"messages": [
{
"role": "user",
"content": "Explain how LiteLLM routing works in simple terms."
}
],
"max_tokens": 1024
}'

OpenRouter’s response is OpenAI-compatible JSON. Parse the choices[0].message.content and you’re done.


Calling LiteLLM (Local Proxy)

From your app, hit your local proxy exactly like you’d hit OpenAI:

from openai import OpenAI
client = OpenAI(
api_key="sk-1234567890abcdef", # Your LITELLM_MASTER_KEY
base_url="http://localhost:8000", # Your LiteLLM proxy
)
response = client.chat.completions.create(
model="gpt-4", # LiteLLM figures out the routing
messages=[{"role": "user", "content": "What's the deal with Docker?"}],
max_tokens=1024,
)
print(response.choices[0].message.content)

LiteLLM intercepts that request, checks your config, picks the right provider (or falls back), and sends the response back.


Can You Use Both?

Absolutely. LiteLLM can use OpenRouter as one of its providers. So you could have:

If you hit Anthropic’s rate limits or it goes down, LiteLLM tries Groq. If Groq doesn’t have the model you want, it hits OpenRouter as the safety net.

model_list:
- model_name: "claude"
litellm_params:
model: "claude-3-opus-20250219"
api_key: ${ANTHROPIC_API_KEY}
- model_name: "claude"
litellm_params:
model: "openrouter/anthropic/claude-3-opus-20250219"
api_key: ${OPENROUTER_API_KEY}
fallback_routes:
- model_name: "claude"
fallbacks: ["openrouter/claude"] # OpenRouter is the emergency exit

So Which One?

Pick OpenRouter if:

Pick LiteLLM if:

Pick both if:


The Bottom Line

LiteLLM is the tooling equivalent of running your own home lab Kubernetes cluster: more powerful, more complex, more rewarding if you care about every detail. OpenRouter is the turnkey equivalent of Heroku: simple, you let them handle it, you pay a premium for the convenience.

Neither is wrong. It depends on whether your idea of fun is “point an API key at something and code” or “build the perfect request routing system with fallback chains that would make a network engineer proud.”

Pick your poison. Either way, you’re no longer a hostage to a single LLM provider.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
stunnel vs spiped
Next Post
Immich vs PhotoPrism vs Ente: Self-Hosted Photo Libraries

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts