0.12.1 • Published 6d agoCLI

arbiter-cli

Licence

MIT

Version

0.12.1

Deps

Size

70 kB

Vulns

Weekly

501

Summary Dependency Versions

arbiter-cli

Cut LLM API costs 69% with one line of code. Smart routing proxy that sends each request to the cheapest model capable of handling it.

Quick Start

# Interactive chat (like Claude CLI, but 69% cheaper)
npx arbiter-cli chat

# AI coding agent (reads files, writes code, runs commands)
npx arbiter-cli code "add error handling to utils.py"

# Set up in your project (zero code changes to your app)
npx arbiter-cli init

# Check your savings
npx arbiter-cli stats

What it does

Arbiter routes every LLM request to the cheapest model that can handle it:

Simple questions → Gemini Flash / GPT-4o Mini (95% cheaper)
Medium code tasks → Qwen / Mistral (90% cheaper)
Complex reasoning → Claude Sonnet 4 / GPT-4o (full quality)

You get the same quality. You pay 69% less on average.

Setup Options

Option 1: Interactive Chat

npx arbiter-cli chat

Chat like you would in Claude CLI. Each response shows which model was picked and how much you saved.

⚡ Arbiter Chat

› What is the capital of France?
  Paris.
  ↳ gemini-2.5-flash · saved <$0.001 (95%)

› Design a CRDT for collaborative editing
  Here's an approach using operation-based CRDTs...
  ↳ claude-sonnet-4.6 · saved $0.00 (0%) — frontier needed

Option 2: Coding Agent

npx arbiter-cli code "fix the bug in main.py"
npx arbiter-cli code   # interactive mode

Reads files, writes code, runs commands. Routes cheap for simple file ops, frontier for architecture decisions.

Option 3: Drop-in Proxy (for your existing code)

npx arbiter-cli init

This adds OPENAI_BASE_URL to your .env. Your existing OpenAI SDK code routes through Arbiter automatically — no code changes.

from openai import OpenAI

# Works unchanged — Arbiter routes behind the scenes
client = OpenAI()  # Reads OPENAI_BASE_URL from .env
response = client.chat.completions.create(
    model="gpt-4o",  # Arbiter overrides intelligently
    messages=[{"role": "user", "content": "What is 2+2?"}]
)
# → Routed to Gemini Flash, saved 95%

CLI Commands

Command	Description
`chat`	Interactive chat with smart routing
`chat --fast`	Prefer low-latency models
`chat --model claude`	Force a specific model
`code`	AI coding agent (interactive)
`code "task"`	One-shot coding task
`init`	Add Arbiter to current project
`status`	Check proxy connection
`stats`	View cost savings

Chat Commands

Command	Description
`/stats`	Session cost breakdown
`/model claude`	Switch model (claude, gpt4o, flash, haiku, fable, auto)
`/good` or `/bad`	Rate response (improves routing)
`/copy`	Copy last response to clipboard
`/save name`	Save conversation
`/load name`	Load conversation
`"""`	Start/end multi-line input
`quit`	Exit

How It Works

Classify — Each request is analyzed for task type (code, reasoning, analysis, creative, etc.) and complexity (simple/medium/complex) in <1ms
Route — Performance matrix picks the cheapest model that meets the quality bar
Quality Gate — If cheap model gives garbage, transparently retries on frontier
Cache — Identical requests return instantly at $0
Compress — Non-frontier responses use concise prompts (fewer output tokens)

Models Available

Model	Best for	Cost
Claude Sonnet 4	Complex reasoning, analysis	$$$
Claude Fable 5	Autonomous coding agents	$$$$
GPT-4o	Complex code, multi-step	$$$
Gemini 2.5 Flash	Simple Q&A, classification	$
GPT-4o Mini	Simple tasks, extraction	$
Qwen 2.5 72B	Code generation, math	$
Llama 3.3 70B	General tasks	$
Mistral Large	Code review, analysis	$$
Claude 3.5 Haiku	Fast responses	$$

Requirements

Node.js 18+
An OpenRouter API key (one key, all models)

Set your key:

export OPENROUTER_API_KEY=sk-or-v1-...
# or add to .env in your project directory

Savings Breakdown

From real testing across 90 varied requests:

Traffic Type	Routed To	Savings
Simple Q&A (40%)	Gemini Flash	95%
Classification (15%)	Gemini Flash	95%
Code tasks (25%)	Qwen / GPT-4o	50-93%
Complex reasoning (10%)	Claude Sonnet 4	0%
Analysis (10%)	Claude Sonnet 4	0%
Average	Mixed	69%

License

MIT

Keywords

llm proxy openai cost optimization arbiter ai routing compression claude gpt context tokens