OpenRouter vs LiteLLM

The Multi-Model Problem

You’ve got Claude running on Anthropic’s API. GPT on OpenAI. Llama on Groq. Maybe some local open source stuff. And your app needs to talk to all of them without embedding each provider’s SDK directly in your code. Because vendor lock-in is for people who enjoy pain.

What if Claude goes down? What if OpenAI’s rates spike? What if you want to A/B test which model actually gives better results for your use case? These are the kinds of 2 AM questions that keep infrastructure people awake.

This is where request routing comes in. Instead of hardcoding OpenAI’s endpoint in your app, you point your code at a gateway, a single API that knows how to talk to every provider you care about. When one provider hiccups, the gateway automatically tries the next one. When pricing changes, you adjust the routing rules once, not in six different services.

The two heavyweight contenders are OpenRouter and LiteLLM. Same problem, wildly different philosophies. Let’s dig in.

OpenRouter: Hosted Gateway, Hands Off

OpenRouter is the turnkey option. You sign up, get an API key, and suddenly your code can talk to ~100 different models through a single endpoint. No infrastructure to run. No Docker compose files. No monitoring dashboards you built yourself at midnight.

How it works: OpenRouter sits between your app and every major LLM provider. You make one HTTP request to OpenRouter’s API. It routes your request to whichever provider you specified, or it auto-selects based on your criteria (cheapest, fastest, highest quality). Then it streams the response back to you.

It’s the cloud SaaS play. Hosted by OpenRouter’s team. You pay their margin on top of the provider’s actual costs. For many teams, that’s totally fine. For your 2 AM self, it might be worth the peace of mind.

The OpenRouter Upside

Setup is trivial. Sign up, drop an API key in your .env, update your client library to point to https://openrouter.ai/api/v1 instead of https://api.openai.com/v1. Done. If your app already uses the OpenAI Python SDK, you literally change three lines:

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("OPENROUTER_API_KEY"),
    base_url="https://openrouter.ai/api/v1",
)

response = client.chat.completions.create(
    model="openrouter/auto",  # OpenRouter picks the best match
    messages=[{"role": "user", "content": "Explain Docker to me"}],
)

Automatic provider fallback. If you specify openrouter/auto, OpenRouter uses their own heuristics to pick a provider. Model overloaded? They try the next cheapest alternative. Provider outage? They route around it. You don’t have to think about it.

Model marketplace. OpenRouter surfaces models from 30+ providers in one catalog. Anthropic, OpenAI, Meta, Cohere, Mistral, Groq, Together, even obscure open source models. No account juggling. One bill. One dashboard showing spend across all of it.

No infrastructure. You’re not responsible for keeping a proxy alive. No Docker image to patch. No Python dependencies to upgrade. OpenRouter’s team handles the plumbing. Your ops load is zero.

Team-friendly. Slap an OpenRouter key on a shared .env and everyone’s good. No individual provider credentials scattered across five different services. That alone is worth something when Sarah leaves the team and you realize her API keys are still live everywhere.

The OpenRouter Downside

You pay the margin. OpenRouter takes a small cut when you top up credits (roughly 5% on card payments), and BYOK requests carry a percentage fee on top of the provider’s actual rate. It’s not free. Over a year of heavy API use, those cuts add up. For a solo home lab tinkering project, it’s pocket change. For a production SaaS startup, it’s real money.

Routing rules are limited. You can specify a model or use openrouter/auto, but you can’t write fancy conditional logic like “if response latency > 5 seconds, failover to this provider” or “if we’ve hit 80% of our budget, switch to cheaper models.” It’s their rules, not yours.

Less observability. You get a dashboard showing what you spent on which models, but you don’t get deep visibility into why a request went to Provider A instead of Provider B. If you need audit logs and detailed routing decisions for compliance, you’re limited.

Vendor lock-in (ironic). You’re trying to avoid vendor lock-in by using a multi-model gateway. But you’ve introduced a new vendor: OpenRouter. If they go down (rare, but possible), your app breaks. If they change pricing or deprecate a model, you have to adapt. It’s distributed risk, but it’s still risk.

LiteLLM: Self-Hosted Proxy, Full Control

LiteLLM is the DIY option. You run the proxy yourself, in Docker, on your home lab server, wherever. You own the routing decisions. You own the fallback chains. You own the observability. You also own the operational headaches.

How it works: LiteLLM is a lightweight Python proxy that translates requests from your app into calls to any LLM provider. It has a YAML config file where you define which providers to use, how to route between them, fallback rules, budget limits, caching behavior, cost tracking. Your app makes one request to your local LiteLLM instance. LiteLLM figures out where to send it.

It’s the self-hosted play. You control everything. You pay zero margin because you’re directly hitting provider APIs with your own credentials.

The LiteLLM Upside

Zero margin. You have accounts with Anthropic, OpenAI, Groq, whatever. LiteLLM uses your credentials directly. You pay exactly what each provider charges, no markup. Over time, that math is compelling.

Complete routing control. Define fallback chains: “Try Groq first because it’s stupid cheap. If that fails, try Together. If that fails, hit Claude with an immediate=true flag because you need quality.” You write YAML rules that match your exact use case.

model_list:
  - model_name: "gpt-4"
    litellm_params:
      model: "gpt-4o"
      api_key: $OPENAI_API_KEY
  - model_name: "gpt-4"
    litellm_params:
      model: "claude-opus-4-0"
      api_key: $ANTHROPIC_API_KEY

fallback_list:
  - gpt-4:
      - gpt-4o  # Try OpenAI first
      - claude-opus-4-0  # If that fails, use Claude

Budget controls. Set spend limits per model, per user, per API key. Hit your budget? LiteLLM blocks requests and you find out before your credit card declines. Audit trail is there. No surprises.

Caching. LiteLLM can cache prompt+completion pairs, so repeated identical requests don’t hit the provider again. Huge for dev workflows. Anthropic’s prompt caching is supported natively.

Observability. Every request gets logged. Latency, cost, provider, success/failure, why it failed. You can hook into Datadog, New Relic, Langfuse, or ingest logs into your own Elasticsearch stack. Full audit trail, full control.

Hosted option exists too. If you want the same control but don’t want to run it yourself, LiteLLM has a hosted proxy called LiteLLM Proxy Pro. Same config, they run the infrastructure. You get the best of both worlds: control and no ops. (But you do pay for it.)

The LiteLLM Downside

You run it. It’s a service you have to keep alive. Deploy it to Docker, write a systemd unit, stick it on your Proxmox homelab VM, whatever. But it’s your responsibility. It needs monitoring. It needs logs. If it crashes at 2 AM, your app stops working.

Setup is harder. You need provider API keys for every service you want to route to. You need to write YAML config. You need to think about how to run the proxy (Docker? Kubernetes? systemd?). There’s no one-click signup.

Debugging fallback chains is messy. When a request fails and bounces through three fallback providers before succeeding, you need to dig through logs to understand why. OpenRouter abstracts that away; LiteLLM gives you full visibility but also full responsibility.

Operator burden. Managing secrets is your problem. Scaling the proxy is your problem. Keeping dependencies patched is your problem. For a solo home lab, that’s fine. For a team, you’re asking someone to own this as a system.

Head-to-Head Comparison

Dimension	OpenRouter	LiteLLM
Setup time	5 minutes	30 minutes (Docker + config)
Cost model	Credit top-up fee + BYOK surcharge	Zero margin (direct API costs)
Infrastructure	Hosted by OpenRouter	You run it
Routing flexibility	Limited (auto / manual select)	Unlimited (YAML rules)
Fallback chains	Basic (one provider at a time)	Full control (custom rules)
Budget controls	Basic dashboard	Granular per-model/user limits
Observability	Dashboard + basic logs	Full audit trail, integratable
Provider count	~100 models from 30+ providers	All (you control the keys)
Compliance/audit	Limited	Full control
Dependency risk	You rely on OpenRouter	You rely on your infrastructure
Best for	Teams, SaaS, quick prototypes	Home labs, full control, cost-sensitive

The Real Cost Question

Let’s do some rough math. Assume you’re using Claude Opus and it costs $15 per 1M input tokens, and pretend OpenRouter’s fees work out to a 5% effective overhead on your spend.

OpenRouter: $15 × 1.05 (≈5% effective fee) = $15.75 per 1M tokens

LiteLLM: $15 × 1.0 (direct API) = $15 per 1M tokens

Difference: $0.75 per 1M tokens.

If you burn 10M tokens a month (heavy usage), that’s roughly $7.50/month extra you’re paying OpenRouter. Small at low volume, but it scales with spend. For a 2-person startup pushing real traffic, it adds up. For a solo home lab player, it’s probably not the deciding factor, but the saved ops burden of not running a proxy might be worth more.

Docker Compose Setup (LiteLLM)

If you want to run LiteLLM locally, here’s a working compose file:

services:
  litellm:
    image: ghcr.io/berriai/litellm:latest
    container_name: litellm-proxy
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - GROQ_API_KEY=${GROQ_API_KEY}
    volumes:
      - ./config.yaml:/app/config.yaml
    command: litellm --config /app/config.yaml --port 8000
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

LiteLLM Config (YAML)

Drop this in your config.yaml file alongside the compose:

model_list:
  - model_name: "gpt-4"
    litellm_params:
      model: "gpt-4o"
      api_key: ${OPENAI_API_KEY}

  - model_name: "gpt-4-fast"
    litellm_params:
      model: "groq/llama-3.3-70b-versatile"
      api_key: ${GROQ_API_KEY}

  - model_name: "claude-3-opus"
    litellm_params:
      model: "claude-opus-4-0"
      api_key: ${ANTHROPIC_API_KEY}

router_settings:
  redis_host: ""  # Optional: set to your Redis instance for shared state
  timeout: 30

router_data:
  - model_name: "gpt-4"
    deployment_id: "gpt-4o"
    rpm_limit: 3500  # OpenAI rate limit

fallback_routes:
  - model_name: "gpt-4"
    fallbacks:
      - "gpt-4-fast"      # Try Groq (cheapest)
      - "claude-3-opus"   # Then Claude (pricier but reliable)

general_settings:
  completion_max_tokens: 4096
  context_window_fallback_ratio: 0.75
  master_key: ${LITELLM_MASTER_KEY}  # Set via .env

Set your .env file:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk-...
LITELLM_MASTER_KEY=sk-1234567890abcdef

Then spin it up:

docker-compose up -d

Calling OpenRouter (Direct)

If you go the OpenRouter route, here’s a curl example hitting their API directly:

curl -X POST https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4",
    "messages": [
      {
        "role": "user",
        "content": "Explain how LiteLLM routing works in simple terms."
      }
    ],
    "max_tokens": 1024
  }'

OpenRouter’s response is OpenAI-compatible JSON. Parse the choices[0].message.content and you’re done.

Calling LiteLLM (Local Proxy)

From your app, hit your local proxy exactly like you’d hit OpenAI:

from openai import OpenAI

client = OpenAI(
    api_key="sk-1234567890abcdef",  # Your LITELLM_MASTER_KEY
    base_url="http://localhost:8000",  # Your LiteLLM proxy
)

response = client.chat.completions.create(
    model="gpt-4",  # LiteLLM figures out the routing
    messages=[{"role": "user", "content": "What's the deal with Docker?"}],
    max_tokens=1024,
)

print(response.choices[0].message.content)

LiteLLM intercepts that request, checks your config, picks the right provider (or falls back), and sends the response back.

Can You Use Both?

Absolutely. LiteLLM can use OpenRouter as one of its providers. So you could have:

Primary: Direct Anthropic API (cheapest for Claude)
Fallback 1: Groq (fast but limited model selection)
Fallback 2: OpenRouter (expensive but reliable, all models available)

If you hit Anthropic’s rate limits or it goes down, LiteLLM tries Groq. If Groq doesn’t have the model you want, it hits OpenRouter as the safety net.

model_list:
  - model_name: "claude"
    litellm_params:
      model: "claude-opus-4-0"
      api_key: ${ANTHROPIC_API_KEY}

  - model_name: "claude"
    litellm_params:
      model: "openrouter/anthropic/claude-opus-4"
      api_key: ${OPENROUTER_API_KEY}

fallback_routes:
  - model_name: "claude"
    fallbacks: ["openrouter/claude"]  # OpenRouter is the emergency exit

So Which One?

Pick OpenRouter if:

You want zero ops burden. Sign up, get a key, move on.
You’re a team and shared API keys matter more than margin.
You’re prototyping and speed beats cost.
You don’t want to think about infrastructure at 2 AM.

Pick LiteLLM if:

You’re running a home lab and you enjoy building infrastructure.
You have direct accounts with providers and want zero margin.
You need custom routing logic or granular budget controls.
You want full observability and audit trails.
You’re cost-sensitive and run significant volume.

Pick both if:

You want LiteLLM locally for cost, with OpenRouter as a fallback for reliability.
You’re running a team where some projects need quick setup (OpenRouter) and others need control (LiteLLM).

The Bottom Line

LiteLLM is the tooling equivalent of running your own home lab Kubernetes cluster: more powerful, more complex, more rewarding if you care about every detail. OpenRouter is the turnkey equivalent of Heroku: simple, you let them handle it, you pay a premium for the convenience.

Neither is wrong. It depends on whether your idea of fun is “point an API key at something and code” or “build the perfect request routing system with fallback chains that would make a network engineer proud.”

Pick your poison. Either way, you’re no longer a hostage to a single LLM provider.

The Multi-Model Problem

OpenRouter: Hosted Gateway, Hands Off

The OpenRouter Upside

The OpenRouter Downside

LiteLLM: Self-Hosted Proxy, Full Control

The LiteLLM Upside

The LiteLLM Downside

Head-to-Head Comparison

The Real Cost Question

Docker Compose Setup (LiteLLM)

LiteLLM Config (YAML)

Calling OpenRouter (Direct)

Calling LiteLLM (Local Proxy)

Can You Use Both?

So Which One?

The Bottom Line

Responses from around the web

Discussion

Related Posts

KV Cache Quantization: Free LLM Context, Almost

Mixture of Experts (MoE) for Self-Hosters, Demystified

Speculative Decoding: Faster LLMs With a Tiny Sidekick

Karakeep: Self-Hosted Bookmarks With AI Tagging

OpenRouter vs LiteLLM

The Multi-Model Problem

OpenRouter: Hosted Gateway, Hands Off

The OpenRouter Upside

The OpenRouter Downside

LiteLLM: Self-Hosted Proxy, Full Control

The LiteLLM Upside

The LiteLLM Downside

Head-to-Head Comparison

The Real Cost Question

Docker Compose Setup (LiteLLM)

LiteLLM Config (YAML)

Calling OpenRouter (Direct)

Calling LiteLLM (Local Proxy)

Can You Use Both?

So Which One?

The Bottom Line

Related Reading

Responses from around the web

Discussion

Related Posts

KV Cache Quantization: Free LLM Context, Almost

Mixture of Experts (MoE) for Self-Hosters, Demystified

Speculative Decoding: Faster LLMs With a Tiny Sidekick

Karakeep: Self-Hosted Bookmarks With AI Tagging