Freeze the model, evolve the harness. Two measured applications: (1) SWE-bench Lite code-repair — 7.7% open-loop -> 58.3% via cheap->frontier tiering (official swebench Docker, verified), ~$0.01-$0.74/instance vs $1-20 for frontier agents; (2) Darwin Shie
Kepler — AI coding agent with operating brief, preflight planning, and sub-agents. SWE-bench Lite evaluated.
Multi-agent orchestration for GitHub Copilot CLI. 19 agents, 59 skills, parallel execution, HUD, PSM, SWE-bench.
Fine-tune cheap open-source LLMs (GLM, Qwen, DeepSeek) on your AI coding agent's successful runs with LoRA (SFT + DPO) so your model cascade escalates to expensive frontier models (GPT, Claude) less often — cutting cost-per-resolved. Turns run history int
Vexp — Context Engine for AI Coding Agents. Pre-indexes your codebase into a dependency graph and delivers ranked context to any MCP-compatible agent. 58% lower cost per task, 90% fewer tool calls (SWE-bench Verified). Works with Claude Code, Cursor, Copi
Model-independent agentic benchmark harness for WrongStack (Aider polyglot + SWE-bench Verified) with deterministic graders and harness fingerprinting.