Keyword: eval

vm-worker
Released
6d ago
Version
2.0.0
Tiny virtual machine for browser to execute javascript modules in Web Worker
virtual machine vm eval exec execute +14
swarmkit-eval
Released
5d ago
Version
0.0.7
Evaluation infrastructure for the swarmkit ecosystem — (harness x model x task x arm x seed) agent evals with ground-truth scoring, cost-matched Pareto reporting, and scalable parallel execution.
eval benchmark agent swarmkit
@gemstack/ai-sdk
Released
2d ago
Version
0.5.0
AI engine: providers, agents, tools, streaming, middleware. The first GemStack package.
ai llm agent agents ai-agents tools +12
mira-eval
Released
5d ago
Version
0.2.0
TypeScript SDK for authoring Mira eval studies (protocol over stdio, no Rust dependency).
eval evaluation llm agent mira
@jetty/sdk
Released
5d ago
Version
0.2.0
Typed TypeScript client for the Jetty AI/ML workflow platform
jetty ai ml workflows sdk trajectory +1
@collabb/knack
Released
6d ago
Version
0.2.1
Skill materialization workspace. Turn expertise into agents that work, in a weekend.
claude skills agent anthropic llm eval +2
@pro-fa/expreszo
Released
6d ago
Version
0.6.6
Mathematical expression evaluator
expression math evaluate eval function parser
lua-redis-wasm
Released
yesterday
Version
1.4.0
WebAssembly-based Redis Lua 5.1 script engine for Node.js - Execute Redis-compatible Lua scripts without a live Redis server
redis lua wasm webassembly lua5.1 script +4
langdrift
Released
3d ago
Version
0.3.0
Locale-aware eval harness for AI agent behavior
ai eval localization testing agents llm +1
@forwardimpact/libharness
Released
yesterday
Version
1.0.0
Agent evaluation framework — prove whether agent changes improved outcomes with reproducible evidence.
eval agent trace claude-code supervisor
@twilio-alpha/assistants-eval
Released
6d ago
Version
0.1.1
promptfoo extension for writing AI evaluations for Twilio AI Assistants
promptfoo twilio evaluation eval ai Twilio AI Assistants
expr-eval
Released
4 years ago
Version
2.0.2
Mathematical expression evaluator
expression math evaluate eval function parser
@intentsolutions/jrig-cli
Released
13h ago
Version
0.1.1
J-Rig seven-layer binary eval CLI for Claude Skills — the j-rig command: package integrity, trigger/functional/regression/baseline scoring, optimizer, and rollout-gate evidence. Self-contained (bundles the internal eval engine).
j-rig jrig intent-eval eval skill claude +3
@intentsolutions/refiner
Released
2d ago
Version
0.2.0
Skill Refiner orchestrator + I/O adapters + CLI: content-addressed store, j-rig score() shell-out adapter, tiered Anthropic propose() adapter, and the j-rig refine commands. Wraps the pure @intentsolutions/refiner-core.
skill-refiner j-rig intent-eval eval skill claude +1
@intentsolutions/refiner-core
Released
2d ago
Version
0.2.0
Skill Refiner pure core: bounded-edit apply transform, deterministic synthetic eval-set bootstrap, the Pareto-dominant acceptance gate (DR-028 P0-RATIFY-1), and the swappable RefinerStrategy interface (AC-13).
skill-refiner j-rig intent-eval eval skill claude
@eidentic/eval
Released
3d ago
Version
0.1.5
Evaluation harness for Eidentic agents — scorers, LLM-as-judge, dataset management, CI pass-rate gate, and production trace promotion.
ai agents typescript eidentic eval testing +2
@eidentic/bench
Released
3d ago
Version
0.1.5
Memory benchmark harness for Eidentic — run LongMemEval / LoCoMo / temporal-reasoning benchmarks with deterministic recall metrics.
ai agents typescript eidentic benchmark memory +2
@forwardimpact/libsyntheticrender
Released
3d ago
Version
0.1.33
Multi-format rendering of synthetic evaluation data — validate fixtures before they enter the eval pipeline.
synthetic render validation agent eval
@forwardimpact/libsyntheticprose
Released
3d ago
Version
0.1.34
LLM-generated prose and YAML — realistic evaluation content so agent improvements are tested against lifelike data.
synthetic prose llm agent eval
@forwardimpact/libsyntheticgen
Released
3d ago
Version
0.1.31
DSL parser and deterministic entity graph generator — repeatable eval fixtures so results are reproducible.
synthetic dsl entity agent eval
@dpopsuev/alef-eval
Released
3d ago
Version
0.0.1
Evaluation harness for Alef agents — OTel span collection, RunMetrics, Pi-parsable reports
agent eval evaluation harness otel
mini-eval
Released
3d ago
Version
0.0.2
A tiny, code-first LLM-eval framework: it owns the loop, the report, and an instrumented model caller — you own the pipeline.
llm eval evaluation scorer judge ai-sdk
routeproof
Released
2d ago
Version
0.3.3
Test how an AI host routes real user intents to your MCP server's tools — catch silent mis-routing before your users do.
mcp model-context-protocol llm eval tool-routing ai-agents +2
@frida/vm
Released
4d ago
Version
2.0.0
Node.js's vm module for Frida
vm browser eval
vigiles
Released
1h ago
Version
10.0.0
Lint & test the harness your AI agent runs on — verify the references in your CLAUDE.md / AGENTS.md and test that your hooks and skills actually work.
claude-code codex agents agentic ai llm +8