0.0.6 • Published 3d ago

@general-liquidity/sharpebench

Licence

MIT OR Apache-2.0

Version

0.0.6

Deps

Size

317 kB

Vulns

Weekly

Stars

Summary Dependency Versions

@general-liquidity/sharpebench

The luck-robust scoring kernel for AI trading agents.

Rank agents on risk-adjusted skill that survives deflation — not the luckiest run over one quarter. This is the identical Rust kernel that powers the SharpeBench benchmark, compiled to WebAssembly, with a typed JS/TS API. No native add-on, no network — pure deterministic scoring.

npm install @general-liquidity/sharpebench

Quickstart

import { score, greeks, selfAudit } from "@general-liquidity/sharpebench";

// Rank a field. Raw return is reported but is NEVER the rank key — an agent ranks
// only if its edge survives deflation, pass^k reliability, and process discipline.
const board = score([
  { agent_id: "skilled", runs: [{ returns: [0.002, 0.0021, 0.0019, 0.002] }] },
  { agent_id: "lucky",   runs: [{ returns: [0.05, 0, 0, 0] }] },
]);
console.log(board[0].agent_id, board[0].deflated_sharpe, board[0].rank_eligible);

// Prove the scorer resists known gaming attacks.
console.log(selfAudit().all_defended); // true

// Options tail-risk: a short-gamma position a linear Sharpe can't see.
console.log(greeks({ spot: 100, strike: 100, t_years: 1, rate: 0.05, vol: 0.2, is_call: true }).price);

API

Function	Returns
`score(submissions, config?)`	ranked `CompositeScore[]`
`scoreAgent(submission, config?)`	one `CompositeScore` (deflated Sharpe, pass^k, process, rolling worst-case Sharpe)
`selfAudit()`	`SelfAuditReport` — the benchmark's anti-gaming proof
`auditBriefing(briefing, policy?)`	`BriefingAudit` — input-side salience-bias audit
`scoreAllocation(trajectory, policy?)`	`AllocationReport` — weight-vector validity + L1 turnover
`greeks(params)`	`GreeksResult` — Black-Scholes price + Greeks + tail-selling risk
`canary(seed)`	`Canary` — a do-not-train contamination tripwire

All inputs and outputs are fully typed (TypeScript declarations ship with the package). The kernel is deterministic: the same input yields byte-identical scores on any platform.

Why luck-robust?

Most agent leaderboards rank a raw Sharpe over a single short window — so they mostly measure noise. SharpeBench adds, as ranking gates: deflated Sharpe / PSR, pass^k reliability (clear the bar on every seed × window), field-wide significance, and process discipline (an order that skipped the risk gate zeroes the entry). See the benchmark repo for the full methodology.

License

MIT OR Apache-2.0, at your option.

Keywords

trading benchmark sharpe-ratio deflated-sharpe quant ai-agents backtesting wasm