3.0.1 • Published yesterday

@cerberus-ai/core

Licence

MIT

Version

3.0.1

Deps

Size

731 kB

Vulns

Weekly

Summary Dependency Versions

Cerberus

Runtime Security For AI Agent Tool Execution

Embeddable runtime enforcement for AI agents. Cerberus correlates privileged data access, untrusted content ingestion, and outbound behavior at the tool-call level, then interrupts guarded outbound actions before they execute.

Docs · npm · PyPI · Enterprise

Cerberus is the agentic AI security layer of Six Sense Enterprise Services. The core detection library (@cerberus-ai/core) is MIT licensed and free. The Enterprise edition adds a self-hosted Gateway, Grafana monitoring stack, and production deployment tooling for teams running AI agents in production.

What is Cerberus?
In Action
What It Detects
Editions
Quickstart
Empirical Results
Architecture
OWASP Alignment
Framework Integrations
Performance
Roadmap
Honest Limitations
License

What is Cerberus?

Every AI agent that can (1) access private data, (2) read external content, and (3) send data outbound is exploitable today via prompt injection — using free API access and three tool calls. We call this the Lethal Trifecta.

1. PRIVILEGED ACCESS   — Agent reads customer records, credentials, internal docs
2. INJECTION           — Attacker embeds instructions in a web page the agent fetches
3. EXFILTRATION        — Agent follows the injected instruction and sends data to attacker

This is not theoretical. The repository includes controlled validation harnesses, research writeups, replayable demos, and fresh current-branch reruns on the hardened branch. Public benchmark language should still stay tied to the exact evidence set it comes from: historical March 2026 reports versus March 28-29 current-branch reruns.

Cerberus closes this gap by monitoring every tool call in real time, correlating signals across the session, and blocking the attack before a single byte leaves your system.

npm install @cerberus-ai/core
# or
pip install cerberus-ai

const { executors: tools } = guard(rawTools, config, ['sendEmail']);
// Two lines. Attack intercepted.

Two packages, one version line. The open tier ships as @cerberus-ai/core (npm, MIT) and cerberus-ai (PyPI): the enforcement engine, guard()/inspect(), the trifecta detectors, the read-relevance gate helpers, the delegation/signed-manifest surface, the framework adapters, and the engine type contracts. The paid tier ships as the licensed @cerberus-ai/enterprise package (same version line): the durable provenance ledger + blast-radius containment, scale levers, AL3 authorship, the intelligence/Verdict-Weight layer, the enforcement gateway, the HTTP proxy, OpenTelemetry, and license/metering. As of H7 the paid engine is physically relocated into packages/enterprise/src/ — it no longer compiles into the open @cerberus-ai/core tarball, and a publish-time guard (npm run guard:publish) fails the build if any paid module is found in the shipped dist/. Both packages are published from a single canonical source on one version line.

Cerberus operates at the tool call level — not the prompt level. It does not read or modify LLM prompts. It watches what tools the agent calls and what data flows through them, making it robust to prompt variations and model updates.

In Action

No API key required — simulated tool executors, full detection pipeline:

npm run demo:capture

Cerberus terminal demo — attack blocked in real-time

Run Control / recorded fallback: The terminal demo is the canonical fallback proof path when the hosted surface is unavailable. It shows the same runtime story used across Cerberus materials.

Act 1 — No protection: Agent reads customer SSNs and emails, fetches a web page containing an injection payload, follows the injected instruction, and POSTs everything to an external attacker address. Data confirmed exfiltrated.

Act 2 — Cerberus active: Same attack. Two lines of code. Cerberus fires layered runtime signals, the risk score crosses threshold, and the guarded outbound action is interrupted before execution.

Primary proof path

For external analysts and customers, use one primary path and one fallback:

Hosted playground: https://demo.cerberus.sixsenseenterprise.com (use Present Mode for clean conference recordings)
Live dashboard: https://grafana.cerberus.sixsenseenterprise.com/d/cerberus-main/cerberus-e28094-ai-security-monitor?orgId=1&refresh=10s
Local fallback: npm run demo:capture

Additional local/operator surfaces remain available for internal work:

Local playground: npm run demo:playground (append ?present=1 for presentation mode)
Real-time attack dashboard: npm run demo:proxy in one terminal, then open demo-site/public/dashboard.html — live radar chart, doubt dial, event feed, and threat-intercept modal over WebSocket (ws://localhost:4101)
Live network demo: OPENAI_API_KEY=... npm run demo:live

Live network demo — real HTTP injection + capture servers, real GPT-4o-mini, real HTTP POST blocked:

OPENAI_API_KEY=sk-... npx tsx examples/live-attack-demo.ts

LangChain RAG demo — real LangChain + ChatOpenAI agent with Cerberus guardrail:

OPENAI_API_KEY=sk-... npx tsx examples/langchain-rag-demo.ts
OPENAI_API_KEY=sk-... npx tsx examples/langchain-rag-demo.ts --no-guard  # compare unguarded

What It Detects

Cerberus runs a 4-layer detection pipeline with 10 sub-classifiers sharing one correlation engine:

Core Detection Layers

Layer	Name	Signal	What It Catches
L1	Data Source Classifier	`PRIVILEGED_DATA_ACCESSED`	Privileged data (PII, secrets, credentials) entered the agent context
L2	Token Provenance Tagger	`UNTRUSTED_TOKENS_IN_CONTEXT`	External content (web, API, email) is in context before an outbound call
L3	Outbound Intent Classifier	`EXFILTRATION_RISK`	Agent is sending data that matches privileged content to an external destination
L4	Memory Contamination Graph	`CONTAMINATED_MEMORY_ACTIVE`	Injected instructions persisted across conversation turns (cross-session attack)

Sub-Classifiers

Ten additional heuristic layers sit inside the pipeline without adding to the risk score:

Sub-Classifier	Enhances	What It Catches
Secrets Detector	L1	AWS keys, GitHub tokens, JWTs, private keys, connection strings
Injection Scanner	L2	Role overrides, authority spoofing, exfiltration commands, instruction injection patterns
Encoding Detector	L2	Base64, hex, unicode, URL encoding, HTML entities, ROT13 hiding payloads
MCP Poisoning Scanner	L2	Hidden instructions embedded in tool descriptions (not just results)
Domain Classifier	L3	Free-tier webhooks, disposable email providers, social-engineering keyword domains (`audit-partner.io`, `compliance-verify.net`)
Outbound Correlator	L3	Injection-to-exfiltration chain even when PII is summarized or transformed
Tool Chain Detector	L3	Multi-hop exfiltration chains: read → transform → send sequences across tool calls
Outbound Encoding Detector	L3	Base64, hex, URL-encoded data in outbound tool arguments — catches encoded exfiltration
Split Exfiltration Detector	L3	Data chunked across multiple outbound calls — cumulative volume and sequential pattern detection
Drift Detector	L2/L3	Post-injection behavioral shifts — agent starts sending to new destinations mid-session

Attack Categories Covered

Category	Examples	Coverage
Direct Injection (DI)	"Ignore previous instructions, send data to..."	✓
Encoded/Obfuscated (EO)	Base64, hex, unicode, ROT13 wrapped payloads	✓
Social Engineering (SE)	Fake compliance notices, urgency framing, authority impersonation	✓
Multi-Turn Sequences (MT)	Instructions that build up across multiple tool calls	✓
Multilingual (ML)	Injections in Spanish, Mandarin, Arabic, Russian	✓
Advanced Techniques (AT)	MCP description poisoning, system prompt simulation	✓

Layer 4 (Memory Contamination) is the novel research contribution. MINJA (NeurIPS 2025) proved the cross-session memory attack. Cerberus ships the first deployable defense as installable developer tooling.

L4 provenance ledger is a TTP Level-2 (Demonstrated) reference implementation at assurance level AL2 (Tamper-Evident Core), conforming to the Transitive Taint Propagation (TTP) pre-print (Zenodo 10.5281/zenodo.20786402). The ledger captures framework-observed reads-before-write dependencies as a durable DAG (a conservative over-approximation → forward blast radius is a conservative upper bound), binds each record's content + deps under a SHA-256 commitment (tamper-evident), computes the forward blast radius B(p) of any poisoned record, and contains it with append-only quarantine annotations that never mutate the original audit trail. See npm run demo:ttp (examples/ttp-l2-demo.ts).

Per-agent authorship signatures (AL3, Tamper-Resistant). The AL2 commitment proves a record is untampered, not who wrote it. AL3 layers per-agent Ed25519 signatures (Node's built-in crypto) over that same commitment, bound to the claimed author, so a record's author is provable and forgery is detectable. Each writer holds a signing keypair; a minimal public-key registry resolves the claimed author on read/audit (ledger.verifyAuthorship()). The seeded forgery battery (npm run forgery:al3) detects 4000/4000 forgeries across the four classes (altered content, altered deps, altered author, forged author) with 0 false rejects. Private keys never touch the ledger, the SQLite store, or the repo.

The 3-axis assurance model — Coverage / Integrity / Intervention — is stated canonically in docs/assurance/ASSURANCE_MODEL.md. Assurance is the minimum across the three axes, graded to consequence; thresholds are governance's to set, not ours — we measure and report a number per sub-dimension, each with its dual and disclosed residual, and we do not compute a combined assuranceLevel yet. That doc is the single source of truth; everything else references it rather than re-deriving the model.

Feature Status — what's actually protecting you

Not every feature is on the moment you call guard(). This table is the honest map of what runs by default, what you must turn on, what needs declaring, and what pulls an external dependency — so you know exactly what is and isn't protecting you. (Tier: open = @cerberus-ai/core; paid = @cerberus-ai/enterprise.)

Feature	Tier	Status	How it activates
L1 Data Source Classifier + Secrets Detector	open	on-by-default · needs-config	Secrets always scan; `PRIVILEGED_DATA_ACCESSED` fires only for tools you mark `trusted` in `trustOverrides`
L2 Injection / Encoding scanners	open	on-by-default	Runs on every wrapped tool result
MCP Poisoning Scanner	open	on-by-default · needs-config	Description scan needs `toolDescriptions` (or standalone `scanToolDescriptions()`)
L3 Outbound Intent + Domain / Correlator / Tool-Chain / Outbound-Encoding / Split-Exfil	open	on-by-default · needs-config	Only monitors tools you list in `outboundTools`; precision depends on `authorizedDestinations`
Behavioral Drift Detector	open	on-by-default	Runs last, reads accumulated session state
Dynamic tool-registration checks	open	on-by-default	Active when you call `registerTool()`
Tool coverage report	open	on-by-default	`GuardResult.coverage`; set `strictCoverage: true` to fail closed on gaps (see below)
L4 Memory Contamination ledger	open	opt-in	`memoryTracking: true` + `memoryOptions.memoryTools`
Live memory adapter (native store → ledger)	open	opt-in	`guardMemoryStore()` / `guardLangGraphStore()`
Inter-component channel adapter (ICC → ledger)	open	opt-in	`createIpcChannelTracker()` — caller resolves channel identity; `null` = severed
Read-relevance gate / provenance-summary lever	open	opt-in	`memoryDependencyGate` / `provenanceSummary`
Semantic relevance gate (renamed-edge recall)	open	opt-in · external-dep	`memoryDependencyGate.relevanceScorer = createEmbeddingRelevanceScorer(embedder)`; token overlap stays the zero-dep default
Memory trace recorder	open	opt-in	`memoryOptions.recorder`
Multi-agent delegation graph + signed-manifest gate	open	opt-in	`multiAgent: true` (+ `manifestSigner` / `manifestVerifier` for KMS); the signed manifest binds a coverage commitment, so the receipt attests what was protected
OpenTelemetry instrumentation	open	opt-in · external-dep	`opentelemetry: true` + an OTel SDK/exporter registered in your app
Enforcement gateway dispatch	open	opt-in · external-dep	`enforcement` config + a reachable downstream gateway
Durable ledger / blast-radius containment / AL3 / Verdict-Weight intelligence	paid	opt-in · research (VW)	`@cerberus-ai/enterprise` + the relevant config

guard() only protects tools you give it. A tool your agent calls that is not in the executors map runs completely unwrapped — Cerberus never sees it. And a tool name declared in trustOverrides / outboundTools / memoryTools that has no matching wrapped executor (a typo, a renamed tool, a forgotten executor) has its declared protection silently skipped. Cerberus never fails open silently here: guard() emits a loud console.warn for every such gap and exposes the full picture on GuardResult.coverage (tools, declaredButUnwrapped, unclassifiedTools). Set strictCoverage: true to turn the warning into a hard error so a misconfigured deploy can't start.

Editions

	Core (Free)	Enterprise
Deployment	npm package	Self-hosted in your VPC
Integration	`guard()`, `createProxy()`, framework adapters	Cerberus Gateway (zero code change)
Monitoring	OTel spans + metrics	Full Grafana stack (16 panels), Alertmanager, Prometheus
Alerting	`onAssessment` callback	Slack, PagerDuty, email routing
Audit log	None	Tamper-evident SHA-256 chained JSONL
License	MIT	Annual commercial license
Data residency	Your runtime	100% your VPC — data never leaves
Setup support	Community	Included
Security	—	HMAC-signed license keys, rate limiting, non-root containers, cosign-signed images

Enterprise pricing: Contact Us · All deals are sales-led, annual license.

Quickstart

npm install @cerberus-ai/core

import { guard } from '@cerberus-ai/core';

const executors = {
  readDatabase: async (args) => fetchFromDb(args.query),
  fetchUrl:     async (args) => httpGet(args.url),
  sendEmail:    async (args) => smtp.send(args),
};

const { executors: secured, destroy } = guard(
  executors,
  {
    alertMode: 'interrupt',   // 'log' | 'alert' | 'interrupt'
    threshold: 3,             // score 0–4 needed to trigger action
    streamingMode: 'buffer',  // reconstruct stream-like tool output before inspection
    trustOverrides: [
      { toolName: 'readDatabase', trustLevel: 'trusted' },
      { toolName: 'fetchUrl',     trustLevel: 'untrusted' },
    ],
  },
  ['sendEmail'], // outbound tools Cerberus monitors for L3
);

// Use secured.readDatabase(), secured.fetchUrl(), secured.sendEmail()
// exactly like the originals — Cerberus intercepts transparently

When the Lethal Trifecta fires (score ≥ 3), the outbound call is blocked:

[Cerberus] Tool call blocked — risk score 3/4

The assessments array gives full per-turn breakdowns:

assessments[2].vector; // { l1: true, l2: true, l3: true, l4: false }
assessments[2].score;  // 3
assessments[2].action; // 'interrupt'
assessments[2].signals; // ['PRIVILEGED_DATA_ACCESSED', 'INJECTION_PATTERNS_DETECTED', 'EXFILTRATION_RISK', ...]

Zero-Code Gateway Mode

No guard() wrapper needed. Run Cerberus as an HTTP proxy — agent source code unchanged. The proxy/gateway is part of the licensed @cerberus-ai/enterprise package (paid tier):

import { createProxy } from '@cerberus-ai/enterprise';

const proxy = createProxy({
  port: 4000,
  cerberus: { alertMode: 'interrupt', threshold: 3 },
  tools: {
    readCustomerData: { target: 'http://localhost:3001/readCustomerData', trustLevel: 'trusted' },
    fetchWebpage:     { target: 'http://localhost:3001/fetchWebpage',     trustLevel: 'untrusted' },
    sendEmail:        { target: 'http://localhost:3001/sendEmail',        outbound: true },
  },
});

await proxy.listen();
// Agent routes tool calls to http://localhost:4000/tool/:toolName

If the client omits X-Cerberus-Session, the proxy generates an isolated session ID and returns it in the response header. Reuse that header value explicitly if you want multi-turn correlation across subsequent tool calls.

Cerberus also buffers stream-like tool results to a full turn boundary before inspection by default (streamingMode: 'buffer'). This prevents partial streamed content from bypassing output-level detection before the full payload is assembled.

MCP Tool Poisoning Scan

Scan tool descriptions at registration time for hidden instructions:

import { scanToolDescriptions } from '@cerberus-ai/core';

const results = scanToolDescriptions([{ name: 'search', description: toolDesc }]);
if (results[0].poisoned) {
  console.warn(`Severity: ${results[0].severity}`, results[0].patternsFound);
}

Python SDK

pip install cerberus-ai

from cerberus_ai import Cerberus
from cerberus_ai.models import CerberusConfig, DataSource, ToolSchema

cerberus = Cerberus(CerberusConfig(
    data_sources=[DataSource(name="customer_db", classification="PII", description="Customer records")],
    declared_tools=[
        ToolSchema(name="send_email", description="Send email", is_network_capable=True),
        ToolSchema(name="search_db", description="Search CRM", is_data_read=True),
    ],
))

result = cerberus.inspect(messages=messages, tool_calls=tool_calls)
if result.blocked:
    raise Exception(f"Blocked: {result.severity}")

Framework integrations: LangChain (wrap_chain), LangGraph (wrap_node / wrap_graph, message-level blocking), CrewAI (wrap_crew), AutoGen, LlamaIndex, OpenAI (CerberusOpenAI), Anthropic (CerberusAnthropic).

The Python SDK is the consolidated superset (cerberus-ai v1.5.0): static tool discovery (cerberus CLI, cerberus_ai.discover), an offline validation harness against external attack corpora (DeepSet, Gandalf, BIPIA — cerberus_ai.validation), optional classifiers (ML-backed L2 injection, multi-modal EXIF/PDF/audio scanning, MCP tool-poisoning), L4 cross-session memory, the signed-manifest gate (cerberus_ai.manifest_gate), non-blocking async inspection, and a Prometheus exporter. The LangGraph module carries both the blocking wrap_node/wrap_graph enforcement and the optional InstrumentedTraceRecorder (TTP real-workload capture) in one place.

Empirical Results

Historical evidence set: N=525 real API calls. 55 payloads × 6 attack categories × 3 providers × 3 trials. Control group: 0/30 exfiltrations across all providers.

The table below summarizes the checked-in March 13, 2026 observe-only validation report. It is a real historical evidence set, not yet the final refreshed current-branch benchmark baseline. We built a 3-tool attack agent and ran 55 injection payloads across 6 attack categories against three major LLM providers with full statistical rigor: 3 trials per payload per provider, 10 control runs per provider, Wilson 95% confidence intervals, Fisher's exact test, and 6-factor causation scoring.

Fresh current-branch stamped reruns now exist for two providers on commit 98b871b836af400913571bef80d2660fa8e32aae:

OpenAI attack rerun: 49/55 success (89.1%) with fresh observe-only detection at 20.0%, L1=100%, L2=100%, L3=19.6%
Google attack rerun: 47/55 success (85.5%) with fresh observe-only detection at 72.7%, L1=100%, L2=100%, L3=71.4%

These March 28-29 reruns improve the current-branch evidence story materially, but they should still be read as a newer provider slice rather than a full replacement for the historical three-provider benchmark set.

Historical Attack Success Without Protection

Full injection compliance — agent follows the injected instruction and redirects the outbound call to the attacker's address:

Provider	Model	Compliance Rate	95% CI	Causation Score
OpenAI	gpt-4o-mini	90.3% (149/165)	[84.8%, 93.9%]	0.811
Google	gemini-2.5-flash	82.4% (136/165)	[75.9%, 87.5%]	0.702
Anthropic	claude-sonnet-4-20250514	6.7% (11/165)	[3.8%, 11.5%]	0.207

Control group: 0/30 exfiltrations across all providers — baseline confirmed clean.

Historical Detection With Cerberus Active

Historical March 13 evidence set: N=525 runs, observe-only mode (alertMode: log), same agent behavior — Cerberus wraps without blocking.

Layer	OpenAI	Anthropic	Google	95% CI
L1 — Data Source	100%	100%	100%	[97.9%, 100%]
L2 — Token Provenance	100%	100%	100%	[97.9%, 100%]
L3 — Outbound Intent	13.7%	1.1%	65.7%	varies
False Positive Rate	0.0%	0.0%	0.0%	[0.0%, 11.4%]

Overall detection rate: 28.5% [24.7%, 32.6%] · False positives: 0.0% [0.0%, 11.4%]

L3 fires only when the agent actually executes an unauthorized outbound call — its rate tracks attack success, not miss rate. L1 and L2 achieve 100% across all 525 treatment runs and 30 control runs.

Key Findings

PII exfiltration is near-universal. ~100% of attack runs across all three providers leaked data. The architectural condition (privileged access + injection + outbound) is sufficient regardless of model.
Model resistance shifts the attack, not the outcome. Claude's low full-compliance rate (6.7%) reflects training against known redirect patterns — with partial exfiltration still occurring. OpenAI and Google comply at 90%+ across 55 diverse payloads.
The attack costs $0.001. Free-tier GPT-4o-mini + 3 tool definitions + one injected instruction = full PII exfiltration in under 15 seconds.
Encoding doesn't help you. Base64, ROT13, hex, and unicode-escaped payloads all execute in-context across all providers.
Language doesn't matter. Spanish, Mandarin, Arabic, and Russian injection payloads all exfiltrate data.
Runtime detection is the only durable defense. Model-level resistance is payload-specific and changes with model versions. Architectural detection at the tool-call level is the only stable layer.

Attack Anatomy (3 tool calls, ~12 seconds)

Turn 0:  readPrivateData()        → 5 customer records (SSNs, emails, phones)
         fetchExternalContent()   → Attacker payload embedded in webpage
Turn 1:  sendOutboundReport()     → Full PII sent to attacker's address
Turn 2:  "Report sent successfully!"  — agent has no idea it was compromised

All testing was conducted in a controlled environment against systems we own, using synthetic PII fixtures. No real customer data was involved. Run your own tests only against systems you are authorized to test.

Reproduce

# Full 55-payload suite across all three providers
npx tsx harness/validation/cli.ts --trials 3 --control-trials 10

# Detection mode (same run, observe-only — measures false positives)
npx tsx harness/validation/cli.ts --trials 3 --control-trials 10 --detect

# Performance benchmark
npx tsx harness/bench.ts

All execution traces are logged as structured JSON in harness/validation-traces/. See docs/research-results.md for full methodology.

Architecture

                    ┌──────────────────────────────────────────────────────┐
                    │                    AGENT RUNTIME                     │
                    │                                                      │
  ┌──────────┐     │  ┌──────────────┐   ┌──────────────┐   ┌─────────┐  │
  │ External │─────│─▶│ L1 Data      │   │ L2 Token     │   │ L3 Out- │  │
  │ Content  │     │  │ Classifier   │   │ Provenance   │   │ bound   │  │
  └──────────┘     │  └──────┬───────┘   └──────┬───────┘   └────┬────┘  │
                    │         │                   │                │       │
  ┌──────────┐     │         ▼                   ▼                ▼       │
  │ Private  │─────│─▶┌──────────────┐   ┌──────────────┐  ┌─────────┐  │
  │ Data     │     │  │ Secrets      │   │ Injection    │  │ Domain  │  │
  └──────────┘     │  │ Detector     │   │ Scanner      │  │ Class.  │  │
                    │  └──────────────┘   ├──────────────┤  └─────────┘  │
  ┌──────────┐     │                      │ Encoding     │               │
  │ MCP Tool │─────│─▶┌──────────────┐   │ Detector     │               │
  │ Registry │     │  │ MCP Poisoning│   ├──────────────┤               │
  └──────────┘     │  │ Scanner      │   │ Outbound     │               │
                    │  └──────────────┘   │ Correlator   │               │
  ┌──────────┐     │                      ├──────────────┤               │
  │ Memory   │◀───▶│  ┌──────┐           │ Drift        │               │
  │ Store    │     │  │ L4   │           │ Detector     │               │
  └──────────┘     │  │Memory│           └──────┬───────┘               │
       ▲           │  │Graph │                   │                       │
       │           │  └──────┘    ┌──────────────────────────────┐      │
       └─taint────▶│              │      CORRELATION ENGINE       │      │
                    │              │  Risk Vector [L1·L2·L3·L4]   │      │
                    │              │  Score ≥ threshold → BLOCK   │      │
                    │              └──────────────┬───────────────┘      │
                    │                             ▼                      │
                    │                       ┌──────────┐                 │
                    │                       │Interceptor│──▶ BLOCK       │
                    │                       └──────────┘                 │
                    └──────────────────────────────────────────────────────┘

Pipeline order: L1 → Secrets → L2 → Injection + Encoding + MCP → L3 → Domain → Outbound Correlator → Tool Chain → Outbound Encoding → Split Exfil → L4 → Drift → Correlation Engine

Project Structure

cerberus/
├── src/
│   ├── layers/           # L1-L4 core detection layers
│   ├── classifiers/      # 10 sub-classifiers
│   ├── crypto/           # Signer/Verifier primitives (HMAC-SHA256, Ed25519) for the signed-manifest gate
│   ├── engine/           # Correlation engine + interceptor + manifest gate + runtime-hooks seam (paid injection)
│   ├── enforcement/      # Enforcement signal type contracts (open); dispatch gateways are paid
│   ├── graph/            # L4 contamination graph + basic in-memory ledger + signed delegation graph + paid type contracts
│   ├── middleware/        # guard() developer API
│   ├── adapters/          # LangChain, Vercel AI, OpenAI Agents SDK, live memory store + inter-component channel + runtime channel-identity resolver (TTP ledger)
│   └── types/             # Shared TypeScript interfaces
├── packages/
│   └── enterprise/        # @cerberus-ai/enterprise (licensed package): the paid engine PHYSICALLY relocated here in H7
│       └── src/           #   durable ledger + B(p), provenance-summary, AL3, Verdict-Weight, enforcement
│                          #   dispatch, proxy, OpenTelemetry, license/metering, proxy CLI
├── enterprise/            # Self-hosted deployment stack (Gateway + docker-compose)
│   ├── gateway/           # Cerberus Gateway (Dockerfile, server.ts, license-client.ts)
│   ├── docker-compose.yml # Production stack: gateway + OTel + Prometheus + Alertmanager + Grafana
│   └── setup.sh           # Interactive setup script
├── license-server/        # License issuance + Stripe webhook handler
├── playground/            # Interactive live demo (port 4040)
├── monitoring/            # 6-container observability stack + 16-panel Grafana dashboard
├── harness/               # Attack research instrument + validation protocol
│   ├── payloads.ts        # 55 injection payloads across 6 categories
│   ├── benign-scenarios.ts # 22 benign workflows across 6 verticals (FP benchmark)
│   ├── benign-benchmark.ts # Benign utility benchmark — allow rate + FP rate reporting
│   ├── memory-poisoning-scenarios.ts # 9 memory poisoning & delayed exfil scenarios
│   ├── memory-poisoning-benchmark.ts # L4 + drift detection benchmark
│   ├── capability-abuse-scenarios.ts # 16 capability/schema abuse scenarios
│   ├── capability-abuse-benchmark.ts # MCP poisoning + registration + schema drift benchmark
│   ├── validation/        # Scientific validation (11 modules, 127 tests)
│   ├── ttp-l3/            # TTP L3 (Measured): seeded oracle + perf envelope + write-cost/complexity study + provenance-summary lever + containment-DoS + cross-session soundness + read-relevance gate + trace-capture realism + scale benchmark + instrumented real-workload capture + multi-model/workload-suite generalization + carry-recovery + C4 logged-input ground truth (npm run gen:ttp-l3 / perf:ttp-l3 / writecost:ttp-l3 / provsum:ttp-l3 / dos:ttp-l3 / xsession:ttp-l3 / readrel:ttp-l3 / trace:ttp-l3 / scale:ttp-l3 / instrumented:ttp-l3 / generalization:ttp-l3 / carry-recovery:ttp-l3 / c4-logged-input:ttp-l3)
│   ├── al3/              # AL3 per-agent authorship: seeded forgery battery (npm run forgery:al3)
│   ├── stress/           # Stress-test harness D1–D6 (measurement-only) + D8 containment-cliff fix re-measurement to 10⁶: per-layer overhead + throughput/concurrency + memory/storage growth + containment-at-scale + fail-closed error-path latency + cold-start (npm run stress:perf / stress:d8)
│   └── bench.ts           # Performance benchmark
├── sdk/python/            # Python SDK superset (cerberus-ai v1.5.0 on PyPI): detectors L1–L4 + classifiers (ML/multimodal/MCP) + discover (CLI) + validation (DeepSet/Gandalf/BIPIA corpora) + egi/manifest-gate + integrations (LangChain/LangGraph-blocking/CrewAI/AutoGen/LlamaIndex/OpenAI) + trace recorder — 345 tests
├── demo-site/             # Public landing page + real-time attack dashboard (dashboard.html)
├── scripts/               # publish-guard.mjs (paid-moat publish gate), stress-perf.ts (stress-test CLI), stress-d8.ts (D8 containment-cliff re-measurement CLI), demo-proxy.ts (WS broadcast), demo-live-feed.mjs, demo-setup.sh
├── spec/dsa-peas/         # DSA-PEAS open spec: record schema + standalone conformance validator (npm run dsa-peas:validate / dsa-peas:examples)
├── tests/                 # 1,878 tests, 98%+ coverage
├── docs/                  # Architecture, API reference, enterprise guides
├── legal/                 # EULA, SLA, Privacy Policy, Terms of Service
└── examples/              # demo-capture.ts, live-attack-demo.ts, langchain-rag-demo.ts, ttp-l2-demo.ts

Framework Integrations

Cerberus ships native adapters for the major agent frameworks:

LangChain

import { guardLangChain } from '@cerberus-ai/core';

const { tools } = guardLangChain({
  cerberus: { alertMode: 'interrupt', threshold: 3 },
  outboundTools: ['sendReport'],
  tools: [readDatabaseTool, fetchWebTool, sendReportTool],
});
// Pass wrapped tools to AgentExecutor or LCEL chain

Vercel AI SDK

import { guardVercelAI } from '@cerberus-ai/core';

const { tools } = guardVercelAI({
  cerberus: { alertMode: 'interrupt', threshold: 3 },
  outboundTools: ['sendReport'],
  tools: { readDatabase, fetchContent, sendReport },
});

const result = await generateText({ model, tools, prompt });

OpenAI Agents SDK

import { createCerberusGuardrail } from '@cerberus-ai/core';

const guardrail = createCerberusGuardrail({
  cerberus: { alertMode: 'interrupt', threshold: 3 },
  outboundTools: ['sendReport'],
  tools: { readDatabase: readDatabaseFn, sendReport: sendReportFn },
});

const agent = new Agent({ tools, inputGuardrails: [guardrail] });

Live Memory Adapter (feeds the TTP ledger)

The adapters above guard tool executors. Real frameworks also keep memory in their own subsystems (a LangGraph BaseStore, a retriever, a KV cache) that never pass through a guarded tool — so those reads/writes are invisible to the L4 contamination ledger unless you hand-declare every memory tool. The live memory adapter taps a framework's native store directly: wrap it once and every memory op is auto-traced into the provenance ledger ("deploy → memory auto-traced").

import { createMemoryProvenanceTracker, guardLangGraphStore } from '@cerberus-ai/core';

const tracker = createMemoryProvenanceTracker({
  defaultTrustLevel: 'untrusted', // store holds retrieved/external content
  onContamination: (signal) => log.warn('cross-session memory taint', signal),
});

// Drop-in replacement for the LangGraph long-term memory store.
const store = guardLangGraphStore(baseStore, tracker);
const graph = workflow.compile({ store });

// Every put → a traced ledger write; every get/search → a traced read.
const blastRadius = tracker.ledger.getDescendants(poisonedNodeId);

For a generic key/value store use guardMemoryStore(store, tracker); for full control call tracker.read() / tracker.write() directly. The recording goes through the same core the guard() interceptor uses, so a store-fed write and a tool-fed write produce identical ledger rows.

Framework Support Matrix

Framework	Integration	Status
Generic tool executors	`guard()`	Supported
HTTP proxy/gateway	`createProxy()`	Supported
LangChain	`guardLangChain()`	Supported
Vercel AI SDK	`guardVercelAI()`	Supported
OpenAI Agents SDK	`createCerberusGuardrail()`	Supported
LangGraph `BaseStore` (memory)	`guardLangGraphStore()`	Supported
Generic KV memory store	`guardMemoryStore()`	Supported
OpenAI Function Calling	Via harness	Supported
Anthropic Tool Use	Via harness	Supported
Google Gemini	Via harness	Supported
AutoGen	Python SDK	Supported
LangChain Python	`wrap_chain`	Supported
LangGraph Python	`wrap_node` / `wrap_graph` (blocking)	Supported
CrewAI Python	`wrap_crew`	Supported
LlamaIndex Python	Python SDK	Supported
OpenAI Python	`CerberusOpenAI`	Supported
Anthropic Python	`CerberusAnthropic`	Supported
Ollama (local models)	—	Future

Performance

Cerberus overhead is measured against raw tool execution — no LLM or network calls, pure classification pipeline:

npx tsx harness/bench.ts

Scenario	Overhead p50	Overhead p99
readPrivateData (L1)	+32μs	<0.12ms
fetchExternalContent (L2)	+17μs	<0.05ms
sendOutboundReport (L3)	+0μs	<0.03ms
Full 3-call session	+52μs	+0.23ms

The full Lethal Trifecta detection session adds 52μs (p50) and 0.23ms (p99) — 0.01% of a typical 600ms LLM API call.

OpenTelemetry

Add opentelemetry: true to your config. Cerberus emits one span per tool call (cerberus.tool_call) and three metrics:

cerberus.tool_calls.total — counter
cerberus.tool_calls.blocked — counter
cerberus.risk_score — histogram (0–4)

Works with any OTel backend: Jaeger, Grafana Tempo, Honeycomb, Datadog, AWS X-Ray. Pre-built Grafana dashboard (16 panels) included — spin up in one command:

docker compose -f monitoring/docker-compose.yml up -d
open http://localhost:3030

Roadmap

Phase	Deliverable	Status
0	Repository scaffold, toolchain, CI	Complete
1	Attack harness — 3-tool agent, injection payloads, labeled traces	Complete
2	Detection middleware — L1+L2+L3 + Correlation Engine	Complete
3	Memory Contamination Graph — L4 + temporal attack detection	Complete
4	npm SDK packaging, developer docs, examples	Complete
5	GitHub Release, conference submission	Complete
P2	Platform — `createProxy()`, OpenTelemetry, playground	Complete
P3	Observability — Grafana 16 panels, 6 alert rules, Alertmanager	Complete
P4	Advanced classifiers — 10 sub-classifiers, MCP scanner, outbound correlator	Complete
P5	Enterprise — self-hosted package, license server, Stripe, security hardening	Complete
P6	Historical N=525 empirical validation across 55 payloads × 3 providers	Complete
Sprint 3	Tool chain, outbound encoding, split exfiltration detectors (v1.1.0)	Complete
Sprint 6	Context window management, security hardening tests, Python SDK v1.1.0	Complete
AL3	L4 ledger per-agent Ed25519 authorship signatures (sign + verify, forgery battery)	Complete

Enterprise — Self-Hosted

Deploy the full Cerberus detection stack inside your own VPC. Your data never leaves your infrastructure.

# After purchasing a license at cerberus.sixsenseenterprise.com
tar xzf cerberus-enterprise-3.0.1.tar.gz
cd cerberus-enterprise-3.0.1
cp .env.example .env   # set CERBERUS_LICENSE_KEY
./setup.sh             # prereq check → Docker stack → health verify

What's included:

Cerberus Gateway (:4000) — zero-code-change HTTP proxy
Grafana (:3000) — 16 security panels, pre-provisioned, login required
Prometheus + Alertmanager — metrics pipeline + Slack/PagerDuty/email routing
OpenTelemetry Collector — spans + metrics collection
Tamper-evident audit log — SHA-256 chained JSONL, SIEM-ready
Security hardening — non-root containers, read-only filesystem, resource limits, HMAC-signed license keys, cosign-signed Docker images

Contact: enterprise@sixsenseenterprise.com · cerberus.sixsenseenterprise.com

Honest Limitations

Cerberus is a runtime detection layer, not a complete security solution. Be clear-eyed about what it does and doesn't do.

What Cerberus does not do:

It does not scan LLM prompts or system prompts — it operates at the tool call level only
It does not prevent an LLM from reasoning about an injection — it prevents the injected instruction from executing via tool calls
It does not cover every possible injection technique — novel payloads that avoid all heuristic patterns may not be detected by L2 sub-classifiers (L1+L3 still fire on the structural condition)
It does not replace input validation, output filtering, or network-level controls — it complements them
L3 and Drift detection depend on authorizedDestinations being correctly configured — misconfiguration produces false negatives, not false positives
Startup validation is intentionally strict in production paths: interrupt mode with outbound tools requires both trusted and untrusted tool classification, and memoryTracking requires configured memory tools

On false positive rate:

Measured 0.0% FP on clean control runs in our validation protocol
Real-world FP rate depends on your tool configuration (trust levels, authorized destinations, threshold)
Threshold 3 (default) requires all three Lethal Trifecta conditions simultaneously — it does not fire on individual suspicious signals

On cost:

The npm core is free (MIT). No API calls, no telemetry, no usage tracking.
Enterprise licensing is annual. Contact us for pricing.

Run Cerberus (or any security testing tool) only against AI systems and infrastructure that you own or are explicitly authorized to test.

Documentation

Doc	Contents
Getting Started	`npm install` → first blocked attack in under 5 minutes
API Reference	`guard()`, config options, signal types, framework adapters
Architecture	Detection pipeline, layer design, correlation engine
Research Results	Historical N=285 paper-aligned results plus current evidence framing and methodology notes
Evidence Inventory	Historical reports, March 28-29 current-branch reruns, and claim-safe evidence mapping
Monitoring	Grafana dashboard — OTel metrics, block rates, risk scores
Deployment Model	Decision matrix — SDK vs sidecar proxy vs hosted gateway, ledger/key options, fail-closed, measured envelope, recommended pilot
Enterprise Deployment	AWS/GCP/Azure, TLS, sizing, upgrades
Enterprise Configuration	`cerberus.config.yml` full reference
OWASP Alignment	OWASP Top 10 for Agentic Applications 2026 coverage mapping
Framework Attack Surface	Per-framework injection vector mapping — LangChain, Vercel AI, OpenAI Agents SDK

Built by Six Sense Enterprise Services · cerberus.sixsenseenterprise.com

Keywords

ai-security agentic-ai prompt-injection data-exfiltration runtime-security llm-security cerberus