Client library to connect to the LangSmith Observability and Evaluation Platform.
Secure, isolated expression evaluation runtime for n8n
A client for the Phoenix API
Automatically trace OpenCode conversations to Braintrust. Captures user messages, assistant responses, and tool calls for observability.
Retrieval latency ladder benchmarks + CI regression gates for @remnic/core
Agent evaluation for Axiom Lattice framework
Sentinel AgentOS — 确定性 Guard 层 + 分层记忆 + 自动评估,让任何 Agent 变得可靠、可审计、可改进
Evaluation definitions and runner for ADK-based Botpress agents
Agent Evaluation and Observability Framework
autocontext — always-on agent evaluation harness
Evaluation framework for Nimbus agents
CLI to evaluate coding-agent skills against a catalog of test cases.
Official DriftGard Node.js SDK — evaluate LLM interactions against your compliance policy
Softlaunch feature flag evaluation engine
EvalGate SDK - AI quality infrastructure. Capture real failures, promote reviewed eval coverage, and gate regressions.
Academic skill pack — dissertation review, question generation, document checking for thesis defense (ГЭК). Contains only evaluation criteria and methodology, no student data.
Shared libraries and services for platform builders and agents — CLIs, retrieval, evaluation, and infrastructure published to npm.
promptfoo extension for writing AI evaluations for Twilio AI Assistants
Lock your AI app's behavior. Golden datasets + LLM-as-judge + structural assertions in CI.
Hlido CLI — independent, evidence-backed scorecards for AI agents. Inline scorecard, search, compare, and tier rankings, fetched live from hlido.eu.
Stop hallucinating numbers. MCP server for Ultimath: math on multiple independent engines, in parallel.
Generic word-error-rate evaluation package
Suspendable Evaluation Engine Built with Opus — a typed, suspendable expression/template engine with a pure synchronous core.
Évaluations, notes, niveaux atteints et progression (CEFR/1→N). DB-agnostique ; compose questa (quiz). Délègue catalogue/niveaux à @mostajs/training.