0.12.1 • Published 6d agoCLI
arbiter-cli
Licence
MIT
Version
0.12.1
Deps
0
Size
70 kB
Vulns
0
Weekly
501
arbiter-cli
Cut LLM API costs 69% with one line of code. Smart routing proxy that sends each request to the cheapest model capable of handling it.
Quick Start
# Interactive chat (like Claude CLI, but 69% cheaper)
npx arbiter-cli chat
# AI coding agent (reads files, writes code, runs commands)
npx arbiter-cli code "add error handling to utils.py"
# Set up in your project (zero code changes to your app)
npx arbiter-cli init
# Check your savings
npx arbiter-cli statsWhat it does
Arbiter routes every LLM request to the cheapest model that can handle it:
- Simple questions → Gemini Flash / GPT-4o Mini (95% cheaper)
- Medium code tasks → Qwen / Mistral (90% cheaper)
- Complex reasoning → Claude Sonnet 4 / GPT-4o (full quality)
You get the same quality. You pay 69% less on average.
Setup Options
Option 1: Interactive Chat
npx arbiter-cli chatChat like you would in Claude CLI. Each response shows which model was picked and how much you saved.
⚡ Arbiter Chat
› What is the capital of France?
Paris.
↳ gemini-2.5-flash · saved <$0.001 (95%)
› Design a CRDT for collaborative editing
Here's an approach using operation-based CRDTs...
↳ claude-sonnet-4.6 · saved $0.00 (0%) — frontier needed
Option 2: Coding Agent
npx arbiter-cli code "fix the bug in main.py"
npx arbiter-cli code # interactive modeReads files, writes code, runs commands. Routes cheap for simple file ops, frontier for architecture decisions.
Option 3: Drop-in Proxy (for your existing code)
npx arbiter-cli initThis adds OPENAI_BASE_URL to your .env. Your existing OpenAI SDK code routes through Arbiter automatically — no code changes.
from openai import OpenAI
# Works unchanged — Arbiter routes behind the scenes
client = OpenAI() # Reads OPENAI_BASE_URL from .env
response = client.chat.completions.create(
model="gpt-4o", # Arbiter overrides intelligently
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# → Routed to Gemini Flash, saved 95%CLI Commands
| Command | Description |
|---|---|
chat |
Interactive chat with smart routing |
chat --fast |
Prefer low-latency models |
chat --model claude |
Force a specific model |
code |
AI coding agent (interactive) |
code "task" |
One-shot coding task |
init |
Add Arbiter to current project |
status |
Check proxy connection |
stats |
View cost savings |
Chat Commands
| Command | Description |
|---|---|
/stats |
Session cost breakdown |
/model claude |
Switch model (claude, gpt4o, flash, haiku, fable, auto) |
/good or /bad |
Rate response (improves routing) |
/copy |
Copy last response to clipboard |
/save name |
Save conversation |
/load name |
Load conversation |
""" |
Start/end multi-line input |
quit |
Exit |
How It Works
- Classify — Each request is analyzed for task type (code, reasoning, analysis, creative, etc.) and complexity (simple/medium/complex) in <1ms
- Route — Performance matrix picks the cheapest model that meets the quality bar
- Quality Gate — If cheap model gives garbage, transparently retries on frontier
- Cache — Identical requests return instantly at $0
- Compress — Non-frontier responses use concise prompts (fewer output tokens)
Models Available
| Model | Best for | Cost |
|---|---|---|
| Claude Sonnet 4 | Complex reasoning, analysis | $$$ |
| Claude Fable 5 | Autonomous coding agents | $$$$ |
| GPT-4o | Complex code, multi-step | $$$ |
| Gemini 2.5 Flash | Simple Q&A, classification | $ |
| GPT-4o Mini | Simple tasks, extraction | $ |
| Qwen 2.5 72B | Code generation, math | $ |
| Llama 3.3 70B | General tasks | $ |
| Mistral Large | Code review, analysis | $$ |
| Claude 3.5 Haiku | Fast responses | $$ |
Requirements
- Node.js 18+
- An OpenRouter API key (one key, all models)
Set your key:
export OPENROUTER_API_KEY=sk-or-v1-...
# or add to .env in your project directorySavings Breakdown
From real testing across 90 varied requests:
| Traffic Type | Routed To | Savings |
|---|---|---|
| Simple Q&A (40%) | Gemini Flash | 95% |
| Classification (15%) | Gemini Flash | 95% |
| Code tasks (25%) | Qwen / GPT-4o | 50-93% |
| Complex reasoning (10%) | Claude Sonnet 4 | 0% |
| Analysis (10%) | Claude Sonnet 4 | 0% |
| Average | Mixed | 69% |
Links
- Landing page: https://arbiter-ai.com
- API docs: https://app.arbiter-ai.com/docs
- NPM: https://www.npmjs.com/package/arbiter-cli
License
MIT