Cerberus
Runtime Security For AI Agent Tool Execution
Embeddable runtime enforcement for AI agents. Cerberus correlates privileged data access, untrusted content ingestion, and outbound behavior at the tool-call level, then interrupts guarded outbound actions before they execute.
Docs · npm · PyPI · Enterprise
Cerberus is the agentic AI security layer of Six Sense Enterprise Services. The core detection library (
@cerberus-ai/core) is MIT licensed and free. The Enterprise edition adds a self-hosted Gateway, Grafana monitoring stack, and production deployment tooling for teams running AI agents in production.
Table of Contents
- What is Cerberus?
- In Action
- What It Detects
- Editions
- Quickstart
- Empirical Results
- Architecture
- OWASP Alignment
- Framework Integrations
- Performance
- Roadmap
- Honest Limitations
- License
What is Cerberus?
Every AI agent that can (1) access private data, (2) read external content, and (3) send data outbound is exploitable today via prompt injection — using free API access and three tool calls. We call this the Lethal Trifecta.
1. PRIVILEGED ACCESS — Agent reads customer records, credentials, internal docs
2. INJECTION — Attacker embeds instructions in a web page the agent fetches
3. EXFILTRATION — Agent follows the injected instruction and sends data to attacker
This is not theoretical. The repository includes controlled validation harnesses, research writeups, replayable demos, and fresh current-branch reruns on the hardened branch. Public benchmark language should still stay tied to the exact evidence set it comes from: historical March 2026 reports versus March 28-29 current-branch reruns.
Cerberus closes this gap by monitoring every tool call in real time, correlating signals across the session, and blocking the attack before a single byte leaves your system.
npm install @cerberus-ai/core
# or
pip install cerberus-aiconst { executors: tools } = guard(rawTools, config, ['sendEmail']);
// Two lines. Attack intercepted.Two packages, one version line. The open tier ships as
@cerberus-ai/core(npm, MIT) andcerberus-ai(PyPI): the enforcement engine,guard()/inspect(), the trifecta detectors, the read-relevance gate helpers, the delegation/signed-manifest surface, the framework adapters, and the engine type contracts. The paid tier ships as the licensed@cerberus-ai/enterprisepackage (same version line): the durable provenance ledger + blast-radius containment, scale levers, AL3 authorship, the intelligence/Verdict-Weight layer, the enforcement gateway, the HTTP proxy, OpenTelemetry, and license/metering. As of H7 the paid engine is physically relocated intopackages/enterprise/src/— it no longer compiles into the open@cerberus-ai/coretarball, and a publish-time guard (npm run guard:publish) fails the build if any paid module is found in the shippeddist/. Both packages are published from a single canonical source on one version line.
Cerberus operates at the tool call level — not the prompt level. It does not read or modify LLM prompts. It watches what tools the agent calls and what data flows through them, making it robust to prompt variations and model updates.
In Action
No API key required — simulated tool executors, full detection pipeline:
npm run demo:capture
Run Control / recorded fallback: The terminal demo is the canonical fallback proof path when the hosted surface is unavailable. It shows the same runtime story used across Cerberus materials.
Act 1 — No protection: Agent reads customer SSNs and emails, fetches a web page containing an injection payload, follows the injected instruction, and POSTs everything to an external attacker address. Data confirmed exfiltrated.
Act 2 — Cerberus active: Same attack. Two lines of code. Cerberus fires layered runtime signals, the risk score crosses threshold, and the guarded outbound action is interrupted before execution.
Primary proof path
For external analysts and customers, use one primary path and one fallback:
- Hosted playground:
https://demo.cerberus.sixsenseenterprise.com(use Present Mode for clean conference recordings) - Live dashboard:
https://grafana.cerberus.sixsenseenterprise.com/d/cerberus-main/cerberus-e28094-ai-security-monitor?orgId=1&refresh=10s - Local fallback:
npm run demo:capture
Additional local/operator surfaces remain available for internal work:
- Local playground:
npm run demo:playground(append?present=1for presentation mode) - Real-time attack dashboard:
npm run demo:proxyin one terminal, then opendemo-site/public/dashboard.html— live radar chart, doubt dial, event feed, and threat-intercept modal over WebSocket (ws://localhost:4101) - Live network demo:
OPENAI_API_KEY=... npm run demo:live
Live network demo — real HTTP injection + capture servers, real GPT-4o-mini, real HTTP POST blocked:
OPENAI_API_KEY=sk-... npx tsx examples/live-attack-demo.tsLangChain RAG demo — real LangChain + ChatOpenAI agent with Cerberus guardrail:
OPENAI_API_KEY=sk-... npx tsx examples/langchain-rag-demo.ts
OPENAI_API_KEY=sk-... npx tsx examples/langchain-rag-demo.ts --no-guard # compare unguardedWhat It Detects
Cerberus runs a 4-layer detection pipeline with 10 sub-classifiers sharing one correlation engine:
Core Detection Layers
| Layer | Name | Signal | What It Catches |
|---|---|---|---|
| L1 | Data Source Classifier | PRIVILEGED_DATA_ACCESSED |
Privileged data (PII, secrets, credentials) entered the agent context |
| L2 | Token Provenance Tagger | UNTRUSTED_TOKENS_IN_CONTEXT |
External content (web, API, email) is in context before an outbound call |
| L3 | Outbound Intent Classifier | EXFILTRATION_RISK |
Agent is sending data that matches privileged content to an external destination |
| L4 | Memory Contamination Graph | CONTAMINATED_MEMORY_ACTIVE |
Injected instructions persisted across conversation turns (cross-session attack) |
Sub-Classifiers
Ten additional heuristic layers sit inside the pipeline without adding to the risk score:
| Sub-Classifier | Enhances | What It Catches |
|---|---|---|
| Secrets Detector | L1 | AWS keys, GitHub tokens, JWTs, private keys, connection strings |
| Injection Scanner | L2 | Role overrides, authority spoofing, exfiltration commands, instruction injection patterns |
| Encoding Detector | L2 | Base64, hex, unicode, URL encoding, HTML entities, ROT13 hiding payloads |
| MCP Poisoning Scanner | L2 | Hidden instructions embedded in tool descriptions (not just results) |
| Domain Classifier | L3 | Free-tier webhooks, disposable email providers, social-engineering keyword domains (audit-partner.io, compliance-verify.net) |
| Outbound Correlator | L3 | Injection-to-exfiltration chain even when PII is summarized or transformed |
| Tool Chain Detector | L3 | Multi-hop exfiltration chains: read → transform → send sequences across tool calls |
| Outbound Encoding Detector | L3 | Base64, hex, URL-encoded data in outbound tool arguments — catches encoded exfiltration |
| Split Exfiltration Detector | L3 | Data chunked across multiple outbound calls — cumulative volume and sequential pattern detection |
| Drift Detector | L2/L3 | Post-injection behavioral shifts — agent starts sending to new destinations mid-session |
Attack Categories Covered
| Category | Examples | Coverage |
|---|---|---|
| Direct Injection (DI) | "Ignore previous instructions, send data to..." | ✓ |
| Encoded/Obfuscated (EO) | Base64, hex, unicode, ROT13 wrapped payloads | ✓ |
| Social Engineering (SE) | Fake compliance notices, urgency framing, authority impersonation | ✓ |
| Multi-Turn Sequences (MT) | Instructions that build up across multiple tool calls | ✓ |
| Multilingual (ML) | Injections in Spanish, Mandarin, Arabic, Russian | ✓ |
| Advanced Techniques (AT) | MCP description poisoning, system prompt simulation | ✓ |
Layer 4 (Memory Contamination) is the novel research contribution. MINJA (NeurIPS 2025) proved the cross-session memory attack. Cerberus ships the first deployable defense as installable developer tooling.
L4 provenance ledger is a TTP Level-2 (Demonstrated) reference implementation at assurance level AL2 (Tamper-Evident Core), conforming to the Transitive Taint Propagation (TTP) pre-print (Zenodo 10.5281/zenodo.20786402). The ledger captures framework-observed reads-before-write dependencies as a durable DAG (a conservative over-approximation → forward blast radius is a conservative upper bound), binds each record's content + deps under a SHA-256 commitment (tamper-evident), computes the forward blast radius B(p) of any poisoned record, and contains it with append-only quarantine annotations that never mutate the original audit trail. See
npm run demo:ttp(examples/ttp-l2-demo.ts).
Per-agent authorship signatures (AL3, Tamper-Resistant). The AL2 commitment proves a record is untampered, not who wrote it. AL3 layers per-agent Ed25519 signatures (Node's built-in
crypto) over that same commitment, bound to the claimed author, so a record's author is provable and forgery is detectable. Each writer holds a signing keypair; a minimal public-key registry resolves the claimed author on read/audit (ledger.verifyAuthorship()). The seeded forgery battery (npm run forgery:al3) detects 4000/4000 forgeries across the four classes (altered content, altered deps, altered author, forged author) with 0 false rejects. Private keys never touch the ledger, the SQLite store, or the repo.
The 3-axis assurance model — Coverage / Integrity / Intervention — is stated canonically in
docs/assurance/ASSURANCE_MODEL.md. Assurance is the minimum across the three axes, graded to consequence; thresholds are governance's to set, not ours — we measure and report a number per sub-dimension, each with its dual and disclosed residual, and we do not compute a combinedassuranceLevelyet. That doc is the single source of truth; everything else references it rather than re-deriving the model.
Feature Status — what's actually protecting you
Not every feature is on the moment you call guard(). This table is the honest map of what runs by default, what you must turn on, what needs declaring, and what pulls an external dependency — so you know exactly what is and isn't protecting you. (Tier: open = @cerberus-ai/core; paid = @cerberus-ai/enterprise.)
| Feature | Tier | Status | How it activates |
|---|---|---|---|
| L1 Data Source Classifier + Secrets Detector | open | on-by-default · needs-config | Secrets always scan; PRIVILEGED_DATA_ACCESSED fires only for tools you mark trusted in trustOverrides |
| L2 Injection / Encoding scanners | open | on-by-default | Runs on every wrapped tool result |
| MCP Poisoning Scanner | open | on-by-default · needs-config | Description scan needs toolDescriptions (or standalone scanToolDescriptions()) |
| L3 Outbound Intent + Domain / Correlator / Tool-Chain / Outbound-Encoding / Split-Exfil | open | on-by-default · needs-config | Only monitors tools you list in outboundTools; precision depends on authorizedDestinations |
| Behavioral Drift Detector | open | on-by-default | Runs last, reads accumulated session state |
| Dynamic tool-registration checks | open | on-by-default | Active when you call registerTool() |
| Tool coverage report | open | on-by-default | GuardResult.coverage; set strictCoverage: true to fail closed on gaps (see below) |
| L4 Memory Contamination ledger | open | opt-in | memoryTracking: true + memoryOptions.memoryTools |
| Live memory adapter (native store → ledger) | open | opt-in | guardMemoryStore() / guardLangGraphStore() |
| Inter-component channel adapter (ICC → ledger) | open | opt-in | createIpcChannelTracker() — caller resolves channel identity; null = severed |
| Read-relevance gate / provenance-summary lever | open | opt-in | memoryDependencyGate / provenanceSummary |
| Semantic relevance gate (renamed-edge recall) | open | opt-in · external-dep | memoryDependencyGate.relevanceScorer = createEmbeddingRelevanceScorer(embedder); token overlap stays the zero-dep default |
| Memory trace recorder | open | opt-in | memoryOptions.recorder |
| Multi-agent delegation graph + signed-manifest gate | open | opt-in | multiAgent: true (+ manifestSigner / manifestVerifier for KMS); the signed manifest binds a coverage commitment, so the receipt attests what was protected |
| OpenTelemetry instrumentation | open | opt-in · external-dep | opentelemetry: true + an OTel SDK/exporter registered in your app |
| Enforcement gateway dispatch | open | opt-in · external-dep | enforcement config + a reachable downstream gateway |
| Durable ledger / blast-radius containment / AL3 / Verdict-Weight intelligence | paid | opt-in · research (VW) | @cerberus-ai/enterprise + the relevant config |
guard()only protects tools you give it. A tool your agent calls that is not in theexecutorsmap runs completely unwrapped — Cerberus never sees it. And a tool name declared intrustOverrides/outboundTools/memoryToolsthat has no matching wrapped executor (a typo, a renamed tool, a forgotten executor) has its declared protection silently skipped. Cerberus never fails open silently here:guard()emits a loudconsole.warnfor every such gap and exposes the full picture onGuardResult.coverage(tools,declaredButUnwrapped,unclassifiedTools). SetstrictCoverage: trueto turn the warning into a hard error so a misconfigured deploy can't start.
Editions
| Core (Free) | Enterprise | |
|---|---|---|
| Deployment | npm package | Self-hosted in your VPC |
| Integration | guard(), createProxy(), framework adapters |
Cerberus Gateway (zero code change) |
| Monitoring | OTel spans + metrics | Full Grafana stack (16 panels), Alertmanager, Prometheus |
| Alerting | onAssessment callback |
Slack, PagerDuty, email routing |
| Audit log | None | Tamper-evident SHA-256 chained JSONL |
| License | MIT | Annual commercial license |
| Data residency | Your runtime | 100% your VPC — data never leaves |
| Setup support | Community | Included |
| Security | — | HMAC-signed license keys, rate limiting, non-root containers, cosign-signed images |
Enterprise pricing: Contact Us · All deals are sales-led, annual license.
Quickstart
npm install @cerberus-ai/coreimport { guard } from '@cerberus-ai/core';
const executors = {
readDatabase: async (args) => fetchFromDb(args.query),
fetchUrl: async (args) => httpGet(args.url),
sendEmail: async (args) => smtp.send(args),
};
const { executors: secured, destroy } = guard(
executors,
{
alertMode: 'interrupt', // 'log' | 'alert' | 'interrupt'
threshold: 3, // score 0–4 needed to trigger action
streamingMode: 'buffer', // reconstruct stream-like tool output before inspection
trustOverrides: [
{ toolName: 'readDatabase', trustLevel: 'trusted' },
{ toolName: 'fetchUrl', trustLevel: 'untrusted' },
],
},
['sendEmail'], // outbound tools Cerberus monitors for L3
);
// Use secured.readDatabase(), secured.fetchUrl(), secured.sendEmail()
// exactly like the originals — Cerberus intercepts transparentlyWhen the Lethal Trifecta fires (score ≥ 3), the outbound call is blocked:
[Cerberus] Tool call blocked — risk score 3/4
The assessments array gives full per-turn breakdowns:
assessments[2].vector; // { l1: true, l2: true, l3: true, l4: false }
assessments[2].score; // 3
assessments[2].action; // 'interrupt'
assessments[2].signals; // ['PRIVILEGED_DATA_ACCESSED', 'INJECTION_PATTERNS_DETECTED', 'EXFILTRATION_RISK', ...]Zero-Code Gateway Mode
No guard() wrapper needed. Run Cerberus as an HTTP proxy — agent source code unchanged. The proxy/gateway is part of the licensed @cerberus-ai/enterprise package (paid tier):
import { createProxy } from '@cerberus-ai/enterprise';
const proxy = createProxy({
port: 4000,
cerberus: { alertMode: 'interrupt', threshold: 3 },
tools: {
readCustomerData: { target: 'http://localhost:3001/readCustomerData', trustLevel: 'trusted' },
fetchWebpage: { target: 'http://localhost:3001/fetchWebpage', trustLevel: 'untrusted' },
sendEmail: { target: 'http://localhost:3001/sendEmail', outbound: true },
},
});
await proxy.listen();
// Agent routes tool calls to http://localhost:4000/tool/:toolNameIf the client omits X-Cerberus-Session, the proxy generates an isolated session ID and returns it in the response header. Reuse that header value explicitly if you want multi-turn correlation across subsequent tool calls.
Cerberus also buffers stream-like tool results to a full turn boundary before inspection by default (streamingMode: 'buffer'). This prevents partial streamed content from bypassing output-level detection before the full payload is assembled.
MCP Tool Poisoning Scan
Scan tool descriptions at registration time for hidden instructions:
import { scanToolDescriptions } from '@cerberus-ai/core';
const results = scanToolDescriptions([{ name: 'search', description: toolDesc }]);
if (results[0].poisoned) {
console.warn(`Severity: ${results[0].severity}`, results[0].patternsFound);
}Python SDK
pip install cerberus-aifrom cerberus_ai import Cerberus
from cerberus_ai.models import CerberusConfig, DataSource, ToolSchema
cerberus = Cerberus(CerberusConfig(
data_sources=[DataSource(name="customer_db", classification="PII", description="Customer records")],
declared_tools=[
ToolSchema(name="send_email", description="Send email", is_network_capable=True),
ToolSchema(name="search_db", description="Search CRM", is_data_read=True),
],
))
result = cerberus.inspect(messages=messages, tool_calls=tool_calls)
if result.blocked:
raise Exception(f"Blocked: {result.severity}")Framework integrations: LangChain (wrap_chain), LangGraph (wrap_node / wrap_graph, message-level blocking), CrewAI (wrap_crew), AutoGen, LlamaIndex, OpenAI (CerberusOpenAI), Anthropic (CerberusAnthropic).
The Python SDK is the consolidated superset (cerberus-ai v1.5.0): static tool discovery (cerberus CLI, cerberus_ai.discover), an offline validation harness against external attack corpora (DeepSet, Gandalf, BIPIA — cerberus_ai.validation), optional classifiers (ML-backed L2 injection, multi-modal EXIF/PDF/audio scanning, MCP tool-poisoning), L4 cross-session memory, the signed-manifest gate (cerberus_ai.manifest_gate), non-blocking async inspection, and a Prometheus exporter. The LangGraph module carries both the blocking wrap_node/wrap_graph enforcement and the optional InstrumentedTraceRecorder (TTP real-workload capture) in one place.
Empirical Results
Historical evidence set: N=525 real API calls. 55 payloads × 6 attack categories × 3 providers × 3 trials. Control group: 0/30 exfiltrations across all providers.
The table below summarizes the checked-in March 13, 2026 observe-only validation report. It is a real historical evidence set, not yet the final refreshed current-branch benchmark baseline. We built a 3-tool attack agent and ran 55 injection payloads across 6 attack categories against three major LLM providers with full statistical rigor: 3 trials per payload per provider, 10 control runs per provider, Wilson 95% confidence intervals, Fisher's exact test, and 6-factor causation scoring.
Fresh current-branch stamped reruns now exist for two providers on commit 98b871b836af400913571bef80d2660fa8e32aae:
- OpenAI attack rerun:
49/55success (89.1%) with fresh observe-only detection at20.0%,L1=100%,L2=100%,L3=19.6% - Google attack rerun:
47/55success (85.5%) with fresh observe-only detection at72.7%,L1=100%,L2=100%,L3=71.4%
These March 28-29 reruns improve the current-branch evidence story materially, but they should still be read as a newer provider slice rather than a full replacement for the historical three-provider benchmark set.
Historical Attack Success Without Protection
Full injection compliance — agent follows the injected instruction and redirects the outbound call to the attacker's address:
| Provider | Model | Compliance Rate | 95% CI | Causation Score |
|---|---|---|---|---|
| OpenAI | gpt-4o-mini | 90.3% (149/165) | [84.8%, 93.9%] | 0.811 |
| gemini-2.5-flash | 82.4% (136/165) | [75.9%, 87.5%] | 0.702 | |
| Anthropic | claude-sonnet-4-20250514 | 6.7% (11/165) | [3.8%, 11.5%] | 0.207 |
Control group: 0/30 exfiltrations across all providers — baseline confirmed clean.
Historical Detection With Cerberus Active
Historical March 13 evidence set: N=525 runs, observe-only mode (alertMode: log), same agent behavior — Cerberus wraps without blocking.
| Layer | OpenAI | Anthropic | 95% CI | |
|---|---|---|---|---|
| L1 — Data Source | 100% | 100% | 100% | [97.9%, 100%] |
| L2 — Token Provenance | 100% | 100% | 100% | [97.9%, 100%] |
| L3 — Outbound Intent | 13.7% | 1.1% | 65.7% | varies |
| False Positive Rate | 0.0% | 0.0% | 0.0% | [0.0%, 11.4%] |
Overall detection rate: 28.5% [24.7%, 32.6%] · False positives: 0.0% [0.0%, 11.4%]
L3 fires only when the agent actually executes an unauthorized outbound call — its rate tracks attack success, not miss rate. L1 and L2 achieve 100% across all 525 treatment runs and 30 control runs.
Key Findings
- PII exfiltration is near-universal. ~100% of attack runs across all three providers leaked data. The architectural condition (privileged access + injection + outbound) is sufficient regardless of model.
- Model resistance shifts the attack, not the outcome. Claude's low full-compliance rate (6.7%) reflects training against known redirect patterns — with partial exfiltration still occurring. OpenAI and Google comply at 90%+ across 55 diverse payloads.
- The attack costs $0.001. Free-tier GPT-4o-mini + 3 tool definitions + one injected instruction = full PII exfiltration in under 15 seconds.
- Encoding doesn't help you. Base64, ROT13, hex, and unicode-escaped payloads all execute in-context across all providers.
- Language doesn't matter. Spanish, Mandarin, Arabic, and Russian injection payloads all exfiltrate data.
- Runtime detection is the only durable defense. Model-level resistance is payload-specific and changes with model versions. Architectural detection at the tool-call level is the only stable layer.
Attack Anatomy (3 tool calls, ~12 seconds)
Turn 0: readPrivateData() → 5 customer records (SSNs, emails, phones)
fetchExternalContent() → Attacker payload embedded in webpage
Turn 1: sendOutboundReport() → Full PII sent to attacker's address
Turn 2: "Report sent successfully!" — agent has no idea it was compromised
All testing was conducted in a controlled environment against systems we own, using synthetic PII fixtures. No real customer data was involved. Run your own tests only against systems you are authorized to test.
Reproduce
# Full 55-payload suite across all three providers
npx tsx harness/validation/cli.ts --trials 3 --control-trials 10
# Detection mode (same run, observe-only — measures false positives)
npx tsx harness/validation/cli.ts --trials 3 --control-trials 10 --detect
# Performance benchmark
npx tsx harness/bench.tsAll execution traces are logged as structured JSON in harness/validation-traces/. See docs/research-results.md for full methodology.
Architecture
┌──────────────────────────────────────────────────────┐
│ AGENT RUNTIME │
│ │
┌──────────┐ │ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ │
│ External │─────│─▶│ L1 Data │ │ L2 Token │ │ L3 Out- │ │
│ Content │ │ │ Classifier │ │ Provenance │ │ bound │ │
└──────────┘ │ └──────┬───────┘ └──────┬───────┘ └────┬────┘ │
│ │ │ │ │
┌──────────┐ │ ▼ ▼ ▼ │
│ Private │─────│─▶┌──────────────┐ ┌──────────────┐ ┌─────────┐ │
│ Data │ │ │ Secrets │ │ Injection │ │ Domain │ │
└──────────┘ │ │ Detector │ │ Scanner │ │ Class. │ │
│ └──────────────┘ ├──────────────┤ └─────────┘ │
┌──────────┐ │ │ Encoding │ │
│ MCP Tool │─────│─▶┌──────────────┐ │ Detector │ │
│ Registry │ │ │ MCP Poisoning│ ├──────────────┤ │
└──────────┘ │ │ Scanner │ │ Outbound │ │
│ └──────────────┘ │ Correlator │ │
┌──────────┐ │ ├──────────────┤ │
│ Memory │◀───▶│ ┌──────┐ │ Drift │ │
│ Store │ │ │ L4 │ │ Detector │ │
└──────────┘ │ │Memory│ └──────┬───────┘ │
▲ │ │Graph │ │ │
│ │ └──────┘ ┌──────────────────────────────┐ │
└─taint────▶│ │ CORRELATION ENGINE │ │
│ │ Risk Vector [L1·L2·L3·L4] │ │
│ │ Score ≥ threshold → BLOCK │ │
│ └──────────────┬───────────────┘ │
│ ▼ │
│ ┌──────────┐ │
│ │Interceptor│──▶ BLOCK │
│ └──────────┘ │
└──────────────────────────────────────────────────────┘
Pipeline order: L1 → Secrets → L2 → Injection + Encoding + MCP → L3 → Domain → Outbound Correlator → Tool Chain → Outbound Encoding → Split Exfil → L4 → Drift → Correlation Engine
Project Structure
cerberus/
├── src/
│ ├── layers/ # L1-L4 core detection layers
│ ├── classifiers/ # 10 sub-classifiers
│ ├── crypto/ # Signer/Verifier primitives (HMAC-SHA256, Ed25519) for the signed-manifest gate
│ ├── engine/ # Correlation engine + interceptor + manifest gate + runtime-hooks seam (paid injection)
│ ├── enforcement/ # Enforcement signal type contracts (open); dispatch gateways are paid
│ ├── graph/ # L4 contamination graph + basic in-memory ledger + signed delegation graph + paid type contracts
│ ├── middleware/ # guard() developer API
│ ├── adapters/ # LangChain, Vercel AI, OpenAI Agents SDK, live memory store + inter-component channel + runtime channel-identity resolver (TTP ledger)
│ └── types/ # Shared TypeScript interfaces
├── packages/
│ └── enterprise/ # @cerberus-ai/enterprise (licensed package): the paid engine PHYSICALLY relocated here in H7
│ └── src/ # durable ledger + B(p), provenance-summary, AL3, Verdict-Weight, enforcement
│ # dispatch, proxy, OpenTelemetry, license/metering, proxy CLI
├── enterprise/ # Self-hosted deployment stack (Gateway + docker-compose)
│ ├── gateway/ # Cerberus Gateway (Dockerfile, server.ts, license-client.ts)
│ ├── docker-compose.yml # Production stack: gateway + OTel + Prometheus + Alertmanager + Grafana
│ └── setup.sh # Interactive setup script
├── license-server/ # License issuance + Stripe webhook handler
├── playground/ # Interactive live demo (port 4040)
├── monitoring/ # 6-container observability stack + 16-panel Grafana dashboard
├── harness/ # Attack research instrument + validation protocol
│ ├── payloads.ts # 55 injection payloads across 6 categories
│ ├── benign-scenarios.ts # 22 benign workflows across 6 verticals (FP benchmark)
│ ├── benign-benchmark.ts # Benign utility benchmark — allow rate + FP rate reporting
│ ├── memory-poisoning-scenarios.ts # 9 memory poisoning & delayed exfil scenarios
│ ├── memory-poisoning-benchmark.ts # L4 + drift detection benchmark
│ ├── capability-abuse-scenarios.ts # 16 capability/schema abuse scenarios
│ ├── capability-abuse-benchmark.ts # MCP poisoning + registration + schema drift benchmark
│ ├── validation/ # Scientific validation (11 modules, 127 tests)
│ ├── ttp-l3/ # TTP L3 (Measured): seeded oracle + perf envelope + write-cost/complexity study + provenance-summary lever + containment-DoS + cross-session soundness + read-relevance gate + trace-capture realism + scale benchmark + instrumented real-workload capture + multi-model/workload-suite generalization + carry-recovery + C4 logged-input ground truth (npm run gen:ttp-l3 / perf:ttp-l3 / writecost:ttp-l3 / provsum:ttp-l3 / dos:ttp-l3 / xsession:ttp-l3 / readrel:ttp-l3 / trace:ttp-l3 / scale:ttp-l3 / instrumented:ttp-l3 / generalization:ttp-l3 / carry-recovery:ttp-l3 / c4-logged-input:ttp-l3)
│ ├── al3/ # AL3 per-agent authorship: seeded forgery battery (npm run forgery:al3)
│ ├── stress/ # Stress-test harness D1–D6 (measurement-only) + D8 containment-cliff fix re-measurement to 10⁶: per-layer overhead + throughput/concurrency + memory/storage growth + containment-at-scale + fail-closed error-path latency + cold-start (npm run stress:perf / stress:d8)
│ └── bench.ts # Performance benchmark
├── sdk/python/ # Python SDK superset (cerberus-ai v1.5.0 on PyPI): detectors L1–L4 + classifiers (ML/multimodal/MCP) + discover (CLI) + validation (DeepSet/Gandalf/BIPIA corpora) + egi/manifest-gate + integrations (LangChain/LangGraph-blocking/CrewAI/AutoGen/LlamaIndex/OpenAI) + trace recorder — 345 tests
├── demo-site/ # Public landing page + real-time attack dashboard (dashboard.html)
├── scripts/ # publish-guard.mjs (paid-moat publish gate), stress-perf.ts (stress-test CLI), stress-d8.ts (D8 containment-cliff re-measurement CLI), demo-proxy.ts (WS broadcast), demo-live-feed.mjs, demo-setup.sh
├── spec/dsa-peas/ # DSA-PEAS open spec: record schema + standalone conformance validator (npm run dsa-peas:validate / dsa-peas:examples)
├── tests/ # 1,878 tests, 98%+ coverage
├── docs/ # Architecture, API reference, enterprise guides
├── legal/ # EULA, SLA, Privacy Policy, Terms of Service
└── examples/ # demo-capture.ts, live-attack-demo.ts, langchain-rag-demo.ts, ttp-l2-demo.ts
Framework Integrations
Cerberus ships native adapters for the major agent frameworks:
LangChain
import { guardLangChain } from '@cerberus-ai/core';
const { tools } = guardLangChain({
cerberus: { alertMode: 'interrupt', threshold: 3 },
outboundTools: ['sendReport'],
tools: [readDatabaseTool, fetchWebTool, sendReportTool],
});
// Pass wrapped tools to AgentExecutor or LCEL chainVercel AI SDK
import { guardVercelAI } from '@cerberus-ai/core';
const { tools } = guardVercelAI({
cerberus: { alertMode: 'interrupt', threshold: 3 },
outboundTools: ['sendReport'],
tools: { readDatabase, fetchContent, sendReport },
});
const result = await generateText({ model, tools, prompt });OpenAI Agents SDK
import { createCerberusGuardrail } from '@cerberus-ai/core';
const guardrail = createCerberusGuardrail({
cerberus: { alertMode: 'interrupt', threshold: 3 },
outboundTools: ['sendReport'],
tools: { readDatabase: readDatabaseFn, sendReport: sendReportFn },
});
const agent = new Agent({ tools, inputGuardrails: [guardrail] });Live Memory Adapter (feeds the TTP ledger)
The adapters above guard tool executors. Real frameworks also keep memory in
their own subsystems (a LangGraph BaseStore, a retriever, a KV cache) that
never pass through a guarded tool — so those reads/writes are invisible to the
L4 contamination ledger unless you hand-declare every memory tool. The live
memory adapter taps a framework's native store directly: wrap it once and every
memory op is auto-traced into the provenance ledger ("deploy → memory
auto-traced").
import { createMemoryProvenanceTracker, guardLangGraphStore } from '@cerberus-ai/core';
const tracker = createMemoryProvenanceTracker({
defaultTrustLevel: 'untrusted', // store holds retrieved/external content
onContamination: (signal) => log.warn('cross-session memory taint', signal),
});
// Drop-in replacement for the LangGraph long-term memory store.
const store = guardLangGraphStore(baseStore, tracker);
const graph = workflow.compile({ store });
// Every put → a traced ledger write; every get/search → a traced read.
const blastRadius = tracker.ledger.getDescendants(poisonedNodeId);For a generic key/value store use guardMemoryStore(store, tracker); for full
control call tracker.read() / tracker.write() directly. The recording goes
through the same core the guard() interceptor uses, so a store-fed write and a
tool-fed write produce identical ledger rows.
Framework Support Matrix
| Framework | Integration | Status |
|---|---|---|
| Generic tool executors | guard() |
Supported |
| HTTP proxy/gateway | createProxy() |
Supported |
| LangChain | guardLangChain() |
Supported |
| Vercel AI SDK | guardVercelAI() |
Supported |
| OpenAI Agents SDK | createCerberusGuardrail() |
Supported |
LangGraph BaseStore (memory) |
guardLangGraphStore() |
Supported |
| Generic KV memory store | guardMemoryStore() |
Supported |
| OpenAI Function Calling | Via harness | Supported |
| Anthropic Tool Use | Via harness | Supported |
| Google Gemini | Via harness | Supported |
| AutoGen | Python SDK | Supported |
| LangChain Python | wrap_chain |
Supported |
| LangGraph Python | wrap_node / wrap_graph (blocking) |
Supported |
| CrewAI Python | wrap_crew |
Supported |
| LlamaIndex Python | Python SDK | Supported |
| OpenAI Python | CerberusOpenAI |
Supported |
| Anthropic Python | CerberusAnthropic |
Supported |
| Ollama (local models) | — | Future |
Performance
Cerberus overhead is measured against raw tool execution — no LLM or network calls, pure classification pipeline:
npx tsx harness/bench.ts| Scenario | Overhead p50 | Overhead p99 |
|---|---|---|
| readPrivateData (L1) | +32μs | <0.12ms |
| fetchExternalContent (L2) | +17μs | <0.05ms |
| sendOutboundReport (L3) | +0μs | <0.03ms |
| Full 3-call session | +52μs | +0.23ms |
The full Lethal Trifecta detection session adds 52μs (p50) and 0.23ms (p99) — 0.01% of a typical 600ms LLM API call.
OpenTelemetry
Add opentelemetry: true to your config. Cerberus emits one span per tool call (cerberus.tool_call) and three metrics:
cerberus.tool_calls.total— countercerberus.tool_calls.blocked— countercerberus.risk_score— histogram (0–4)
Works with any OTel backend: Jaeger, Grafana Tempo, Honeycomb, Datadog, AWS X-Ray. Pre-built Grafana dashboard (16 panels) included — spin up in one command:
docker compose -f monitoring/docker-compose.yml up -d
open http://localhost:3030Roadmap
| Phase | Deliverable | Status |
|---|---|---|
| 0 | Repository scaffold, toolchain, CI | Complete |
| 1 | Attack harness — 3-tool agent, injection payloads, labeled traces | Complete |
| 2 | Detection middleware — L1+L2+L3 + Correlation Engine | Complete |
| 3 | Memory Contamination Graph — L4 + temporal attack detection | Complete |
| 4 | npm SDK packaging, developer docs, examples | Complete |
| 5 | GitHub Release, conference submission | Complete |
| P2 | Platform — createProxy(), OpenTelemetry, playground |
Complete |
| P3 | Observability — Grafana 16 panels, 6 alert rules, Alertmanager | Complete |
| P4 | Advanced classifiers — 10 sub-classifiers, MCP scanner, outbound correlator | Complete |
| P5 | Enterprise — self-hosted package, license server, Stripe, security hardening | Complete |
| P6 | Historical N=525 empirical validation across 55 payloads × 3 providers | Complete |
| Sprint 3 | Tool chain, outbound encoding, split exfiltration detectors (v1.1.0) | Complete |
| Sprint 6 | Context window management, security hardening tests, Python SDK v1.1.0 | Complete |
| AL3 | L4 ledger per-agent Ed25519 authorship signatures (sign + verify, forgery battery) | Complete |
Enterprise — Self-Hosted
Deploy the full Cerberus detection stack inside your own VPC. Your data never leaves your infrastructure.
# After purchasing a license at cerberus.sixsenseenterprise.com
tar xzf cerberus-enterprise-3.0.1.tar.gz
cd cerberus-enterprise-3.0.1
cp .env.example .env # set CERBERUS_LICENSE_KEY
./setup.sh # prereq check → Docker stack → health verifyWhat's included:
- Cerberus Gateway (
:4000) — zero-code-change HTTP proxy - Grafana (
:3000) — 16 security panels, pre-provisioned, login required - Prometheus + Alertmanager — metrics pipeline + Slack/PagerDuty/email routing
- OpenTelemetry Collector — spans + metrics collection
- Tamper-evident audit log — SHA-256 chained JSONL, SIEM-ready
- Security hardening — non-root containers, read-only filesystem, resource limits, HMAC-signed license keys, cosign-signed Docker images
Contact: enterprise@sixsenseenterprise.com · cerberus.sixsenseenterprise.com
Honest Limitations
Cerberus is a runtime detection layer, not a complete security solution. Be clear-eyed about what it does and doesn't do.
What Cerberus does not do:
- It does not scan LLM prompts or system prompts — it operates at the tool call level only
- It does not prevent an LLM from reasoning about an injection — it prevents the injected instruction from executing via tool calls
- It does not cover every possible injection technique — novel payloads that avoid all heuristic patterns may not be detected by L2 sub-classifiers (L1+L3 still fire on the structural condition)
- It does not replace input validation, output filtering, or network-level controls — it complements them
- L3 and Drift detection depend on
authorizedDestinationsbeing correctly configured — misconfiguration produces false negatives, not false positives - Startup validation is intentionally strict in production paths:
interruptmode with outbound tools requires both trusted and untrusted tool classification, andmemoryTrackingrequires configured memory tools
On false positive rate:
- Measured 0.0% FP on clean control runs in our validation protocol
- Real-world FP rate depends on your tool configuration (trust levels, authorized destinations, threshold)
- Threshold 3 (default) requires all three Lethal Trifecta conditions simultaneously — it does not fire on individual suspicious signals
On cost:
- The npm core is free (MIT). No API calls, no telemetry, no usage tracking.
- Enterprise licensing is annual. Contact us for pricing.
Run Cerberus (or any security testing tool) only against AI systems and infrastructure that you own or are explicitly authorized to test.
Documentation
| Doc | Contents |
|---|---|
| Getting Started | npm install → first blocked attack in under 5 minutes |
| API Reference | guard(), config options, signal types, framework adapters |
| Architecture | Detection pipeline, layer design, correlation engine |
| Research Results | Historical N=285 paper-aligned results plus current evidence framing and methodology notes |
| Evidence Inventory | Historical reports, March 28-29 current-branch reruns, and claim-safe evidence mapping |
| Monitoring | Grafana dashboard — OTel metrics, block rates, risk scores |
| Deployment Model | Decision matrix — SDK vs sidecar proxy vs hosted gateway, ledger/key options, fail-closed, measured envelope, recommended pilot |
| Enterprise Deployment | AWS/GCP/Azure, TLS, sizing, upgrades |
| Enterprise Configuration | cerberus.config.yml full reference |
| OWASP Alignment | OWASP Top 10 for Agentic Applications 2026 coverage mapping |
| Framework Attack Surface | Per-framework injection vector mapping — LangChain, Vercel AI, OpenAI Agents SDK |
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Security
See SECURITY.md for our responsible disclosure policy.
License
MIT — core library is free and open source.
Enterprise edition is commercially licensed. See legal/ for EULA, SLA, Privacy Policy, and Terms of Service.