npm.io
0.12.1 • Published 6d agoCLI

arbiter-cli

Licence
MIT
Version
0.12.1
Deps
0
Size
70 kB
Vulns
0
Weekly
501

arbiter-cli

Cut LLM API costs 69% with one line of code. Smart routing proxy that sends each request to the cheapest model capable of handling it.

Quick Start

# Interactive chat (like Claude CLI, but 69% cheaper)
npx arbiter-cli chat

# AI coding agent (reads files, writes code, runs commands)
npx arbiter-cli code "add error handling to utils.py"

# Set up in your project (zero code changes to your app)
npx arbiter-cli init

# Check your savings
npx arbiter-cli stats

What it does

Arbiter routes every LLM request to the cheapest model that can handle it:

  • Simple questions → Gemini Flash / GPT-4o Mini (95% cheaper)
  • Medium code tasks → Qwen / Mistral (90% cheaper)
  • Complex reasoning → Claude Sonnet 4 / GPT-4o (full quality)

You get the same quality. You pay 69% less on average.

Setup Options

Option 1: Interactive Chat
npx arbiter-cli chat

Chat like you would in Claude CLI. Each response shows which model was picked and how much you saved.

⚡ Arbiter Chat

› What is the capital of France?
  Paris.
  ↳ gemini-2.5-flash · saved <$0.001 (95%)

› Design a CRDT for collaborative editing
  Here's an approach using operation-based CRDTs...
  ↳ claude-sonnet-4.6 · saved $0.00 (0%) — frontier needed
Option 2: Coding Agent
npx arbiter-cli code "fix the bug in main.py"
npx arbiter-cli code   # interactive mode

Reads files, writes code, runs commands. Routes cheap for simple file ops, frontier for architecture decisions.

Option 3: Drop-in Proxy (for your existing code)
npx arbiter-cli init

This adds OPENAI_BASE_URL to your .env. Your existing OpenAI SDK code routes through Arbiter automatically — no code changes.

from openai import OpenAI

# Works unchanged — Arbiter routes behind the scenes
client = OpenAI()  # Reads OPENAI_BASE_URL from .env
response = client.chat.completions.create(
    model="gpt-4o",  # Arbiter overrides intelligently
    messages=[{"role": "user", "content": "What is 2+2?"}]
)
# → Routed to Gemini Flash, saved 95%

CLI Commands

Command Description
chat Interactive chat with smart routing
chat --fast Prefer low-latency models
chat --model claude Force a specific model
code AI coding agent (interactive)
code "task" One-shot coding task
init Add Arbiter to current project
status Check proxy connection
stats View cost savings

Chat Commands

Command Description
/stats Session cost breakdown
/model claude Switch model (claude, gpt4o, flash, haiku, fable, auto)
/good or /bad Rate response (improves routing)
/copy Copy last response to clipboard
/save name Save conversation
/load name Load conversation
""" Start/end multi-line input
quit Exit

How It Works

  1. Classify — Each request is analyzed for task type (code, reasoning, analysis, creative, etc.) and complexity (simple/medium/complex) in <1ms
  2. Route — Performance matrix picks the cheapest model that meets the quality bar
  3. Quality Gate — If cheap model gives garbage, transparently retries on frontier
  4. Cache — Identical requests return instantly at $0
  5. Compress — Non-frontier responses use concise prompts (fewer output tokens)

Models Available

Model Best for Cost
Claude Sonnet 4 Complex reasoning, analysis $$$
Claude Fable 5 Autonomous coding agents $$$$
GPT-4o Complex code, multi-step $$$
Gemini 2.5 Flash Simple Q&A, classification $
GPT-4o Mini Simple tasks, extraction $
Qwen 2.5 72B Code generation, math $
Llama 3.3 70B General tasks $
Mistral Large Code review, analysis $$
Claude 3.5 Haiku Fast responses $$

Requirements

  • Node.js 18+
  • An OpenRouter API key (one key, all models)

Set your key:

export OPENROUTER_API_KEY=sk-or-v1-...
# or add to .env in your project directory

Savings Breakdown

From real testing across 90 varied requests:

Traffic Type Routed To Savings
Simple Q&A (40%) Gemini Flash 95%
Classification (15%) Gemini Flash 95%
Code tasks (25%) Qwen / GPT-4o 50-93%
Complex reasoning (10%) Claude Sonnet 4 0%
Analysis (10%) Claude Sonnet 4 0%
Average Mixed 69%

License

MIT

Keywords