MCP server unit testing, end to end (e2e) testing, and server evals
Catch every time your AI coding agent touches auth, secrets, or skips a test, then turn the correction you made into a local regression eval. Local-first, deterministic, no LLM judge.
Evaluation definitions and runner for ADK-based Botpress agents
Included local runner for roleplay.sh social-engineering tests.
Searchable local SQLite sidecar that keeps oversized Pi tool output useful without flooding model context
Golden datasets, scorecards, and cost-quality evaluation contracts for Plasius AI workloads.
Fail pull requests when AI behavior regresses.
Author verifiable eval records through a draft → review → revise → submit loop with server-enforced graders; compile to JSONL/CSV/Inspect/lm-eval via MCP. STDIO or Streamable HTTP.
Open source AI evaluation framework — LLM-as-judge + assertion-based evals for any AI app. CLI + MCP server.
memoturn JS/TS SDK — tracing, OpenAI wrapper, LangChain callback, prompt fetch.
Tracebound CLI: deterministic primitives for the Tracebound agent-improvement loop.
Lightcurve human-grounded voice dataset API client and runtime instrumenter.