9.6.0 • Published 6d agoCLI

lynkr

Licence

Apache-2.0

Version

9.6.0

Deps

Size

3.4 MB

Vulns

Weekly

157

Install scriptsThis package runs scripts during installation (preinstall/install/postinstall)

Summary Dependency Versions

Lynkr

The AI coding proxy that compresses tokens before they hit the model.

87.6% fewer tokens on JSON tool results. 53% fewer tokens on tool-heavy requests. 171ms semantic cache hits. Zero code changes.

87.6%
JSON Compression

53%
Tool Token Reduction

171ms
Semantic Cache Hits

13+
LLM Providers

0
Code Changes Required

Numbers from a live benchmark against LiteLLM on identical workloads. See full report →

Quick Start (2 Minutes)

1. Install Lynkr

npm install -g lynkr

2. Configure Lynkr

First run creates a .env file. Edit it with your provider settings.

Option A: Free & Local (Ollama) - Recommended for Testing

# Install Ollama first: https://ollama.com
ollama pull qwen2.5-coder:latest

Create/edit .env in your project directory:

# Provider
MODEL_PROVIDER=ollama
FALLBACK_ENABLED=false

# Ollama Configuration
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest

# Server
PORT=8081

# Optional: Limits (remove for unlimited)
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100

# Disable overly strict command filtering
POLICY_SAFE_COMMANDS_ENABLED=false

Option B: Cloud (OpenRouter) - Recommended for Production

# Get API key from https://openrouter.ai

Create/edit .env:

# Provider
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key-here
FALLBACK_ENABLED=false

# Server
PORT=8081

# Optional: Limits (remove for unlimited)
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100

# Optional: Enable caching
PROMPT_CACHE_ENABLED=true
SEMANTIC_CACHE_ENABLED=true

Option C: Enterprise (AWS Bedrock)

Create/edit .env:

# Provider
MODEL_PROVIDER=bedrock
AWS_BEDROCK_API_KEY=your-aws-key
AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
FALLBACK_ENABLED=false

# Server
PORT=8081

# Optional: Limits (remove for unlimited)
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100

Option D: Enterprise (Databricks)

Create/edit .env:

# Provider
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
DATABRICKS_API_KEY=your-token
FALLBACK_ENABLED=false

# Server
PORT=8081

# Optional: Limits (remove for unlimited)
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100

Then start Lynkr:

lynkr start

3. Connect Your Tool

Claude Code

Windows (Command Prompt):

set ANTHROPIC_BASE_URL=http://localhost:8081
set ANTHROPIC_API_KEY=dummy
claude "write a hello world in python"

Linux/macOS:

export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=dummy
claude "write a hello world in python"

Cursor IDE

Settings → Models → Override Base URL
Set to: http://localhost:8081/v1
API Key: any-value

Codex CLI

Edit ~/.codex/config.toml:

model_provider = "lynkr"

[model_providers.lynkr]
base_url = "http://localhost:8081/v1"
wire_api = "responses"

Done! Your AI tool now uses your chosen provider.

Common Startup Errors

Error: `unable to determine transport target for "pino-pretty"`

Problem: You're running an older version (< 9.3.0).

Solution: Update to the latest version:

npm install -g lynkr@latest

If you must use an older version, set NODE_ENV=production before starting.

Warning: `Missing tier configuration: TIER_SIMPLE, TIER_MEDIUM...`

This is just a warning - you can ignore it. Tier routing is optional.

To remove the warning, add to .env:

TIER_SIMPLE=ollama:qwen2.5-coder:latest
TIER_MEDIUM=ollama:qwen2.5-coder:latest
TIER_COMPLEX=ollama:qwen2.5-coder:latest
TIER_REASONING=ollama:qwen2.5-coder:latest

Warning: `FALLBACK_PROVIDER='databricks' is enabled but missing credentials`

Solution: Add to .env:

FALLBACK_ENABLED=false

Error: `connect ECONNREFUSED ::1:11434` (Ollama)

Problem: Ollama is not running.

Solution:

ollama serve

Keep this terminal open, and start Lynkr in a new terminal.

Error: `Connection refused` or `404 Not Found`

Problem: Lynkr is not running or wrong port.

Solution: Check Lynkr is running on the correct port:

curl http://localhost:8081/

Should return: {"service":"Lynkr","version":"9.x.x","status":"running"}

Why Lynkr?

AI coding tools lock you into one provider and send every token raw. Lynkr breaks both locks.

Claude Code / Cursor / Codex / Cline / Continue
                    ↓
                  Lynkr
          ┌─────────────────────┐
          │  Strip unused tools  │  ← 53% fewer tokens on tool calls
          │  Compress JSON blobs │  ← 87.6% on large tool results
          │  Semantic cache      │  ← 171ms hits, 0 tokens billed
          │  Route by complexity │  ← cheap model for simple, cloud for hard
          └─────────────────────┘
                    ↓
    Ollama | Bedrock | Azure | Moonshot | OpenRouter | OpenAI

What you get:

53% fewer tokens on tool-heavy requests (Claude Code, Cursor sessions)
87.6% compression on large JSON tool results (grep, file reads, test output)
Semantic cache serves repeated queries in 171ms with 0 tokens billed
Automatic tier routing — simple questions go to cheap models, complex ones escalate
Route through your company's infrastructure (Databricks, Azure, Bedrock)
Zero code changes — just change one environment variable

Supported Providers

Provider	Type	Example Models	Cost
Ollama	Local	qwen2.5-coder, deepseek-coder, llama3	Free
llama.cpp	Local	Any GGUF model	Free
LM Studio	Local	Local models with GUI	Free
OpenRouter	Cloud	GPT-4o, Claude 3.5, Llama 3, Gemini	$
AWS Bedrock	Cloud	Claude, Llama, Mistral, Titan	$$
Databricks	Cloud	Claude Sonnet 4.5, Opus 4.6	$$$
Azure OpenAI	Cloud	GPT-4o, o1, o3	$$$
Azure Anthropic	Cloud	Claude Sonnet, Opus	$$$
OpenAI	Cloud	GPT-4o, o3-mini	$$$
DeepSeek	Cloud	DeepSeek R1, Reasoner	$

4 local providers for 100% offline, free usage. 10+ cloud providers for scale.

Advanced: Tier Routing (Save Even More)

Route different request types to different models automatically:

# .env file
MODEL_PROVIDER=ollama
FALLBACK_ENABLED=false

# Use small/fast models for simple tasks
TIER_SIMPLE=ollama:qwen2.5:3b

# Use medium models for normal coding
TIER_MEDIUM=ollama:qwen2.5:7b

# Use powerful models for complex architecture
TIER_COMPLEX=ollama:deepseek-r1:14b
TIER_REASONING=ollama:deepseek-r1:14b

# Optional: Limits (remove for unlimited) for long conversations
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100

Lynkr analyzes each request and routes it to the appropriate tier. Simple questions use fast models. Complex refactoring uses powerful models.

Result: 70-90% of requests use cheaper/faster models. Only hard problems hit expensive models.

Complete .env Examples

MVP: Minimal Working Setup (Ollama)

Copy-paste ready configuration for immediate use:

# .env - Minimal Ollama Setup

# ============================================
# REQUIRED: Provider Configuration
# ============================================
MODEL_PROVIDER=ollama
FALLBACK_ENABLED=false

# ============================================
# REQUIRED: Ollama Settings
# ============================================
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest

# ============================================
# REQUIRED: Server Configuration
# ============================================
PORT=8081
HOST=0.0.0.0

# ============================================
# REQUIRED: Claude Code/Cursor Compatibility
# ============================================
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100
POLICY_SAFE_COMMANDS_ENABLED=false

# ============================================
# OPTIONAL: Performance (Recommended)
# ============================================
LOG_LEVEL=warn
LOAD_SHEDDING_ENABLED=true
LOAD_SHEDDING_HEAP_THRESHOLD=0.85

Steps:

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
Pull model: ollama pull qwen2.5-coder:latest
Copy above to .env in your project directory
Run: lynkr start

Production: Cloud with Tier Routing (OpenRouter)

Optimized for cost savings with smart routing:

# .env - Production OpenRouter Setup

# ============================================
# REQUIRED: Provider Configuration
# ============================================
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key-here
FALLBACK_ENABLED=false

# ============================================
# REQUIRED: Server Configuration
# ============================================
PORT=8081
HOST=0.0.0.0

# ============================================
# TIER ROUTING: Smart Cost Optimization
# ============================================
# Simple queries → Cheap/fast model
TIER_SIMPLE=openrouter:google/gemini-flash-1.5

# Normal coding → Balanced model
TIER_MEDIUM=openrouter:anthropic/claude-3.5-sonnet

# Complex refactoring → Powerful model
TIER_COMPLEX=openrouter:anthropic/claude-opus-4

# Deep reasoning → Most capable model
TIER_REASONING=openrouter:anthropic/claude-opus-4

# ============================================
# REQUIRED: Claude Code/Cursor Compatibility
# ============================================
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100
POLICY_SAFE_COMMANDS_ENABLED=false

# ============================================
# OPTIONAL: Token Optimization (60-80% savings)
# ============================================
PROMPT_CACHE_ENABLED=true
SEMANTIC_CACHE_ENABLED=true
SEMANTIC_CACHE_THRESHOLD=0.95
TOOL_INJECTION_ENABLED=false

# ============================================
# OPTIONAL: Performance Tuning
# ============================================
LOG_LEVEL=warn
LOAD_SHEDDING_ENABLED=true
LOAD_SHEDDING_HEAP_THRESHOLD=0.85

Expected savings: 70-90% of requests use Gemini Flash ($). Only 10-30% use Claude Opus ($$$).

Enterprise: Databricks Foundation Models

For teams using Databricks Model Serving:

# .env - Enterprise Databricks Setup

# ============================================
# REQUIRED: Provider Configuration
# ============================================
MODEL_PROVIDER=databricks
DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com
DATABRICKS_API_KEY=dapi1234567890abcdef
FALLBACK_ENABLED=false

# ============================================
# REQUIRED: Model Configuration
# ============================================
# Option 1: Single model (no tier routing)
DATABRICKS_MODEL=databricks-meta-llama-3-1-405b-instruct

# Option 2: Tier routing (comment out above, uncomment below)
# TIER_SIMPLE=databricks:databricks-meta-llama-3-1-70b-instruct
# TIER_MEDIUM=databricks:databricks-claude-sonnet-4-5
# TIER_COMPLEX=databricks:databricks-claude-opus-4-6
# TIER_REASONING=databricks:databricks-claude-opus-4-6

# ============================================
# REQUIRED: Server Configuration
# ============================================
PORT=8081
HOST=0.0.0.0

# ============================================
# REQUIRED: Claude Code/Cursor Compatibility
# ============================================
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100
POLICY_SAFE_COMMANDS_ENABLED=false

# ============================================
# OPTIONAL: Enterprise Features
# ============================================
LOG_LEVEL=info
LOAD_SHEDDING_ENABLED=true
LOAD_SHEDDING_HEAP_THRESHOLD=0.85

# Optional: Metrics for monitoring
# PROMETHEUS_METRICS_ENABLED=true

Hybrid: Local + Cloud Fallback

Use free Ollama, fallback to cloud when needed:

# .env - Hybrid Setup (Advanced)

# ============================================
# PRIMARY: Local Ollama
# ============================================
MODEL_PROVIDER=ollama
OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL=qwen2.5-coder:latest

# ============================================
# FALLBACK: Cloud Provider
# ============================================
FALLBACK_ENABLED=true
FALLBACK_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-your-key-here

# ============================================
# TIER ROUTING: Mix Local + Cloud
# ============================================
TIER_SIMPLE=ollama:qwen2.5:3b
TIER_MEDIUM=ollama:qwen2.5:7b
TIER_COMPLEX=openrouter:anthropic/claude-3.5-sonnet
TIER_REASONING=openrouter:anthropic/claude-opus-4

# ============================================
# REQUIRED: Server Configuration
# ============================================
PORT=8081
HOST=0.0.0.0

# ============================================
# REQUIRED: Claude Code/Cursor Compatibility
# ============================================
POLICY_MAX_STEPS=50
POLICY_MAX_TOOL_CALLS=100
POLICY_SAFE_COMMANDS_ENABLED=false

# ============================================
# OPTIONAL: Performance
# ============================================
LOG_LEVEL=warn
LOAD_SHEDDING_ENABLED=true

Best of both worlds: 80% of requests stay local (free). Complex tasks use cloud (paid).

Common Issues & Fixes

Issue	Solution
"Service temporarily overloaded"	Ollama model too large for RAM. Use smaller model or increase `--max-old-space-size`
"Route not found: HEAD /"	Ignore - harmless health check from Claude Code
"Hallucinated tool calls"	Normal - Lynkr automatically filters invalid tools
"Safe Command DSL blocked"	Add `POLICY_SAFE_COMMANDS_ENABLED=false` to `.env`
"spawn graphify ENOENT"	Optional feature. Set `CODE_GRAPH_ENABLED=false` in `.env` (see Advanced Features section for installation)
Slow first request (20+ sec)	Ollama loading model into memory. Add `OLLAMA_KEEP_ALIVE=30m` in Ollama config
No response after N turns	Remove `POLICY_MAX_STEPS` and `POLICY_MAX_TOOL_CALLS` from `.env` (unlimited by default in v9.3.0+)

Advanced Features

Token Optimization (60-80% savings)

# Enable all optimizations
PROMPT_CACHE_ENABLED=true
SEMANTIC_CACHE_ENABLED=true
TOOL_INJECTION_ENABLED=false
CODE_MODE_ENABLED=true

Always-on (no config): smart tool selection (server mode), RTK tool-result compression (test/git/grep/lint/build/JSON output), MCP tool dedup (drops built-in WebSearch/WebFetch when an Exa/Tavily MCP tool is present), and request bypass (Claude CLI Warmup / title-extraction calls are answered locally, never hitting a provider).

Optional terse-output mode to cut output tokens:

CAVEMAN_ENABLED=true        # off by default — nudges the model to be concise
CAVEMAN_LEVEL=lite          # lite | full | ultra

Cost tracking & model pricing

Per-request cost is computed from a model-pricing registry (LiteLLM → models.dev, cached 24h) and recorded in telemetry. Models the registry doesn't know record cost_usd=null (logged once) rather than a fabricated price. Pin prices for unknown models:

# Per-1M-token USD prices, JSON keyed by model name
MODEL_PRICE_OVERRIDES={"my-model":{"input":0.5,"output":1.5}}

Memory System (Titans-inspired)

MEMORY_ENABLED=true
MEMORY_TTL=3600000  # 1 hour

Load Shedding & Resilience

LOAD_SHEDDING_ENABLED=true
LOAD_SHEDDING_HEAP_THRESHOLD=0.85

Admin Hot-Reload (no restart needed)

curl -X POST http://localhost:8081/v1/admin/reload

Code Intelligence (Optional - Graphify)

Graphify provides AST-based code analysis for smarter routing decisions.

Installation (Rust required):

# Install Rust if not already installed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# Build and install graphify
git clone https://github.com/safishamsi/graphify
cd graphify
cargo build --release
sudo cp target/release/graphify /usr/local/bin/

# Verify installation
graphify --version

Enable in .env:

CODE_GRAPH_ENABLED=true
CODE_GRAPH_WORKSPACE=/path/to/your/project  # Optional, defaults to cwd

Features:

AST-based complexity scoring
Structural code analysis (19 languages supported)
Enhanced routing decisions based on code structure

Note: Graphify is completely optional. If not installed, Lynkr falls back to simpler complexity analysis.

Installation Methods

NPM (recommended)

npm install -g lynkr

One-line installer

curl -fsSL https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash

Homebrew (macOS / Linux)

brew tap fast-editor/lynkr
brew install lynkr
lynkr --version

Upgrade later with brew update && brew upgrade lynkr. The formula tracks the latest lynkr npm release automatically.

Docker

git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr
docker-compose up -d

From source

git clone https://github.com/Fast-Editor/Lynkr.git
cd Lynkr
npm install
cp .env.example .env
npm start

Documentation

Guide	Description
Installation	All installation methods
Provider Setup	Configuration for all 12+ providers
Claude Code	Claude Code CLI integration
Cursor IDE	Cursor setup + troubleshooting
Codex CLI	Codex configuration
Tier Routing	Smart model routing by complexity
Token Optimization	60-80% cost reduction
Troubleshooting	Common issues and solutions
API Reference	REST API endpoints
Production	Enterprise deployment

Benchmark Results

Head-to-head against LiteLLM on the same backends (Ollama minimax-m2.5, Moonshot, Azure OpenAI), 9 scenarios across 4 feature categories. Apples-to-apples comparison is Lynkr vs LiteLLM billed tokens on the same scenario. Run with node benchmark-tier-routing.js.

Run: June 5, 2026 · Lynkr v9.3.2 · LiteLLM v1.87.1 · macOS, Apple Silicon.

Token reduction (vs LiteLLM, same model & prompt)

Mechanism	Lynkr	LiteLLM	Result
Smart tool selection (14 tools)	959 tokens · $0.0044	2,085 tokens · $0.0091	53% fewer tokens, 52% cheaper
TOON compression (60-item grep JSON)	427 tokens · $0.009	3,458 tokens · $0.018	87.6% fewer tokens, 50% cheaper

Lynkr strips irrelevant tool schemas (smart tool selection) and binary-compresses large JSON tool results (TOON) — both in-process, no added latency.

Semantic cache

	Tokens billed	Response time
First call (cold)	2,857	1,891ms
Second call — paraphrased, cache hit	0 (served from cache)	171ms (11× faster)

Near-identical prompts return cached responses in 171ms. Zero model tokens billed on a cache hit.

Tier routing

Request	Lynkr routes to	LiteLLM routes to
"What does git stash do?"	`minimax-m2.5` (local, free)	Ollama (local)
JWT vs cookies security analysis	`moonshot` (cloud — correct)	Ollama (local — wrong call)

Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and escalates automatically. LiteLLM's cost-based-routing sends everything to the cheapest model regardless of complexity.

Cost projection (100,000 requests/month, same backend)

	Monthly cost	vs LiteLLM
LiteLLM	~$818	baseline
Lynkr	~$409	~50% cheaper

Based on a tool-heavy agentic session (TOON scenario). On equal footing — same provider, same model — Lynkr is cheaper due to token optimization.

→ Full benchmark report with methodology

Cost Comparison

Scenario	Direct Anthropic	Lynkr + Ollama	Lynkr + OpenRouter
Daily coding (8h)	$10-30/day	$0 (free)	$2-8/day
Monthly (heavy use)	$300-900	$0	$60-240

With tier routing + token optimization: additional 50-87% savings on cloud providers depending on workload.

Why Lynkr vs Alternatives

Feature	Lynkr	LiteLLM	OpenRouter	PortKey
Setup	`npm install -g lynkr`	Python + Docker + Postgres	Account signup	Docker stack
Claude Code native	Drop-in	Requires config		Partial
Cursor native	Drop-in	Partial		Partial
Local models	Ollama, llama.cpp, LM Studio	Ollama only
Automatic tier routing	15-dimension scorer	Cost-only		Manual metadata
TOON JSON compression	up to 87.6%
Smart tool selection	up to 60% token reduction
Semantic cache	171ms hits, 0 tokens			Prompt cache only
Long-term memory	SQLite, per-session
MCP integration	+ Code Mode (96% reduction)
Self-hosted	Node.js only	Python stack	SaaS	Docker
Dependencies	Node.js 20+	Python, Prisma, PostgreSQL	None	Docker, Python

Lynkr's edge: Purpose-built for AI coding tools. Compresses tokens before they reach the model — not just after. Zero-config for Claude Code, Cursor, and Codex. Installs in one command.

Community

GitHub Discussions — Ask questions
Report Issues — Bug reports
NPM Package — Official releases
DeepWiki — AI-powered docs

License

Apache 2.0 — See LICENSE.

Built by Vishal Veera Reddy for developers who want control over their AI tools.