litellm
Call 100+ LLM APIs with one interface using LiteLLM — unified API proxy for OpenAI, Anthropic, Google, Mistral, Cohere, and self-hosted models. Use when someone asks to "switch between LLM providers", "LiteLLM", "unified LLM API", "LLM proxy", "call Claude and GPT with the same code", "LLM load balancing", or "multi-model AI gateway". Covers provider routing, fallbacks, rate limiting, spend tracking, and self-hosted proxy.
Usage
Getting Started
- Install the skill using the command above
- Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
- Reference the skill in your prompt
- The AI will use the skill's capabilities automatically
Example Prompts
- "Analyze the sales data in revenue.csv and identify trends"
- "Create a visualization comparing Q1 vs Q2 performance metrics"
Documentation
Overview
LiteLLM provides a single API to call 100+ LLM providers — OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure, Bedrock, Ollama, and more. Write your code once using the OpenAI SDK format, then switch providers by changing a model string. As a proxy server, it adds load balancing, fallbacks, rate limiting, spend tracking, and API key management for teams.
When to Use
- Using multiple LLM providers and want a unified interface
- Need automatic fallbacks (if Claude is down, use GPT)
- Cost tracking across multiple providers and teams
- Load balancing requests across multiple API keys or models
- Self-hosted proxy to manage LLM access for a team
Instructions
Setup
pip install litellm
# Or run as proxy server
pip install 'litellm[proxy]'
SDK Usage (Python)
# llm.py — Call any LLM with the same interface
from litellm import completion
# OpenAI
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# Anthropic — same interface, just change the model string
response = completion(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}],
)
# Google Gemini
response = completion(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello!"}],
)
# Local Ollama
response = completion(
model="ollama/llama3",
messages=[{"role": "user", "content": "Hello!"}],
api_base="http://localhost:11434",
)
# All return the same response format (OpenAI-compatible)
print(response.choices[0].message.content)
Proxy Server
# litellm_config.yaml — Proxy configuration
model_list:
- model_name: "fast"
litellm_params:
model: gpt-4o-mini
api_key: sk-...
- model_name: "smart"
litellm_params:
model: claude-sonnet-4-20250514
api_key: sk-ant-...
- model_name: "smart" # Second "smart" model = load balancing
litellm_params:
model: gpt-4o
api_key: sk-...
- model_name: "cheap"
litellm_params:
model: gemini/gemini-2.0-flash
api_key: AIza...
router_settings:
routing_strategy: "latency-based-routing"
num_retries: 3
timeout: 30
fallbacks: [{"smart": ["fast"]}] # If smart fails, use fast
general_settings:
master_key: "sk-master-key-xxx" # Admin key
# Start proxy
litellm --config litellm_config.yaml --port 4000
# Call via OpenAI SDK (any language!)
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-master-key-xxx" \
-d '{"model": "smart", "messages": [{"role": "user", "content": "Hello"}]}'
Node.js via Proxy
// app.ts — Use any OpenAI SDK client with LiteLLM proxy
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000/v1",
apiKey: "sk-master-key-xxx",
});
// Calls route to Claude or GPT based on load balancing config
const response = await client.chat.completions.create({
model: "smart",
messages: [{ role: "user", content: "Explain monads simply." }],
});
Spend Tracking
# Track costs per team/user/project
from litellm import completion
response = completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
metadata={
"user": "user-123",
"team": "engineering",
"project": "chatbot",
},
)
# LiteLLM proxy stores costs in its database
# Query via API: GET /spend/logs?user=user-123
Examples
Example 1: Multi-provider AI application
User prompt: "My app uses Claude for reasoning and GPT-4o for function calling. Set up a unified interface."
The agent will configure LiteLLM with named model groups, route by capability, and add fallbacks between providers.
Example 2: Team LLM gateway with cost controls
User prompt: "Set up an LLM proxy for our team with per-user rate limits and spend tracking."
The agent will deploy the LiteLLM proxy, configure API keys per team member, set rate limits and budget caps, and enable spend logging.
Guidelines
- Model format:
provider/model—anthropic/claude-sonnet-4-20250514,gemini/gemini-2.0-flash - Proxy for teams — centralize API keys, track spend, enforce rate limits
- Fallbacks for reliability — if primary model fails, route to backup
- Load balancing — multiple entries with same
model_namedistribute traffic - Latency-based routing — LiteLLM picks the fastest responding provider
- Spend tracking — costs calculated per-request, queryable via API
- OpenAI SDK compatible — any OpenAI client library works with the proxy
- Streaming works —
stream=Trueworks across all providers - Environment variables —
OPENAI_API_KEY,ANTHROPIC_API_KEYetc. auto-detected
Information
- Version
- 1.0.0
- Author
- terminal-skills
- Category
- Data & AI
- License
- Apache-2.0