npm.io
0.6.3 • Published 3d agoCLI

cladding

Licence
MIT
Version
0.6.3
Deps
3
Size
3.3 MB
Vulns
0
Weekly
13

cladding — Unified Governance for AI-Coupled Engineering

English · 한국어

cladding

The LLM writes the code — cladding owns what comes before and after.
True to the name (cladding = the protective shell), it's the verification layer wrapped around your host LLM.

ironclad spec tests detectors license

The official reference implementation of the Ironclad standard.
Before your host LLM (Claude Code · Codex · Gemini · Cursor) starts work, cladding feeds it the project's intent;
after it finishes, cladding verifies the result with 37 detectors and a 15-stage gate. A division of labor toward the same goal.

Host LLM before (intent injection) · after (verification) · record (feedback loop) — how cladding wraps the LLM in a collaborative structure

This loop is after one thing —
turning the AI's "it's done" from a claim into a proof.

Intent is preserved as a record · drift is blocked automatically · completion is proven by a verification signature.
So you can ship code an AI wrote with the same trust as code a human wrote.

For you, the developer, that means — less time spent reviewing AI code, the why of the code still there six months later,
and no more judging "is this really done?" by gut feel before you ship.

How it works with your host LLM

cladding doesn't write code. Writing code is always the host LLM's job. What cladding takes on is the two things LLMs are bad at — remembering the intent precisely when they start, and mechanically verifying the result when they finish.

Before — inject the intent

So the LLM starts with the right context.

  • Project map injected — every time a conversation starts, "how many features, what's in progress, the last verification result" is handed to the LLM automatically.
  • Only the intent that matters — just the why of the feature at hand, its related features, and its acceptance criteria are pulled out (it does not dump the whole spec).
  • Project rules applied — the forbidden and preferred patterns the team agreed on go in as standing instructions every time.

After — verify the result

If the LLM's output drifts from the spec, block it.

  • 15-stage verification gate — type · lint · tests · coverage · architecture · secrets · E2E · evidence, all in one pass.
  • 36 drift checks — whether spec code test still agree, cross-checked automatically in every direction.
  • An implementation-blind grader — a separate agent that cannot read the code grades it with tests written from the spec alone.
  • Run the deliverable for real — the "tests pass but the program doesn't run" situation is blocked by actually running it.

Record — input for the next turn

Verification results flow back into the LLM's context.

  • Verification signature — the code state that cleared every check is saved to the repo as a signature: "this was verified at this point."
  • Audit ledger — every verification run, completion attempt, and block is recorded with who · when · what result.
  • Repair card — try to end a conversation leaving a deterministic check (drift · architecture · secret) failing and it blocks you once, then carries the failure summary forward into the next conversation's opening automatically.

While this loop runs, you just develop in natural language as usual — there are no commands to memorize.

Real-time intervention (map injection · instant block · stop-block) all works on Claude Code. On Codex · Gemini · Cursor the same verification runs through in-conversation tool calls plus the git · CI gate.

"done" is earned, not declared

The chronic disease of AI coding is "it's done" declared with no verification behind it. In cladding, a feature's status: done is not a value you write — it's a value you earn.

One scene — a hook blocks the LLM's 'done' declaration, the gate's RED feeds back as a repair card, and 'done' is earned only when the gate is GREEN
  1. When the AI tries to write the completion mark itself → it's blocked on the spot ("earn completion by verifying it") — Claude Code real-time; on other hosts the gate · CI play the same role.
  2. When the AI requests completion → all 9 deterministic stages (type · lint · drift · architecture · secret · tests · coverage · spec conformance · deliverable smoke) run, and it's recorded as done only if every one passes; one failure and it auto-reverts — the E2E · evidence stages are handled by CI's full 15.
  3. The moment it passes, a verification signature is left behind — committable proof that "this code was verified at this point."
  4. Try to end a conversation leaving a failure → it blocks you once (end again on the same failure and it records the fact rather than letting it through) and carries the repair card into the next conversation.

The limits are disclosed plainly too: bypass paths exist that the instant block can't see, and those are caught by after-the-fact verification (the gate · drift checks). The instant block is the first line of defense, after-the-fact verification the second — and neither is a standalone guarantee.

What changes

How a vanilla AI coding environment and a cladding environment behave in the same situation.

Situation Vanilla AI coding cladding
Code drifts from the spec fixed if a reviewer notices auto-detected right after the edit (alert) · "done" can't pass while it's drifting
The AI says "it's done" you can only take its word done earned only when the gate is GREEN
Ending a session in a failing state exits as-is, forgotten next time the exit is blocked once, the repair card handed off
Two devs add a feature at the same time merge conflict hash-8 IDs · separate files → 0 conflicts
Who verifies the AI-written code? the AI that wrote it self-certifies (risky) an implementation-blind grader + the mechanical gate
Switching AI tools reconfigure per tool one spec → 4 hosts wired automatically

How it works

Spec → Code → Tests runs as a single cycle — the spec records the why, the gate verifies, and the detectors block drift.

Spec → Code → Tests cycle — the 15-stage verification and 37 drift detectors guard the cycle
1. Spec — the single source of intent (SSoT)

The spec records the why (what we're building and why). A 4-tier single source of truth — intent on top, the implementation below.

Tier Role Who edits Authority
A — Spec intent (what to build) humans define sealed · LLMs cannot edit
B — Design design (how to build it) humans edit freely checked against A
C — Derived implementation (code · tests) + attestation (verification signature) LLMs · humans regenerated by reading the code
D — Audit audit record (what actually happened) append-only immutable

A outranks every tier below it — if spec (A) and code (C) disagree, the code is the one that's wrong. If the intent (A) wavers, everything wavers, so it's sealed against LLM edits.

Sharded · multi-dev safe — like spec/features/<slug>-<hash>.yaml, each feature gets its own file + an 8-char hash ID (e.g. F-d86375d8). Two devs creating new features at the same time land in different files with different IDs, so zero merge conflicts. Details: Hash-based feature IDs.

4-tier SSoT — A(Spec) → B(Design) → C(Derived + attestation) → D(Audit), A outranks B
2. Gate — the 15-stage Iron Law

To be recognized as "done," a change must clear the strict gate (9 of the 15 stages are deterministic), and the full 15 stages — including E2E · evidence — are run by CI. The same check engine is applied in per-moment bundles: a fast 3 stages at commit time (when the git hook is installed), 9 stages at push · completion time, and all 15 in CI. Only the depth differs — the check logic is identical.

15-stage Iron Law gate — static(6) · test & conformance(4) · E2E(3) · evidence(2), attestation signature when GREEN
Stage What it checks
1.1 Type · 1.2 Lint type errors · code style
1.3 Drift spec code mismatches across 37 detectors
1.4 Commit · 1.5 Arch · 1.6 Secret clean working tree · architecture invariants · leaked API keys
2.1 Unit · 2.2 Coverage unit tests pass · coverage drop blocked
2.3 Spec conformance · 2.4 Deliverable smoke the implementation-blind grader's tests pass · the declared deliverable actually runs (blocks the empty-green "tests pass but the deliverable doesn't run")
3.1 Smoke · 3.2 Perf · 3.3 Visual e2e critical paths · performance budgets · UI visual regression
4.1 Audit · 4.2 UAT every AC (acceptance criterion) has at least one piece of evidence · every done feature has at least one piece of evidence
3. Detector — 37 drift detectors

Drift in every direction across spec · code · test is detected automatically. Full catalog: detector catalog.

DirectionWhat it catchesCountRepresentative detectors
spec codein the spec but missing from code, or code that strays from the spec10MISSING_IMPLEMENTATION, AC_DRIFT, DELIVERABLE_INTEGRITY
code testcode present but no tests · coverage drop · secrets6MISSING_TESTS, COVERAGE_DROP, HARDCODED_SECRET
spec testan AC in the spec not verified by a test · false status5UNTESTED_AC, STATUS_DRIFT, SPEC_CONFORMANCE
spec hygienethe spec's own integrity (ID collisions · dependency cycles)8ID_COLLISION, SLUG_CONFLICT, DEPENDENCY_CYCLE
environment integritybuild environment · meta files3HARNESS_INTEGRITY, META_INTEGRITY
verification freshnesswhether code changed since the verification signature1STALE_ATTESTATION (new in 0.6.0)
governance · docspolicy violations · doc drift3ABSENCE_OF_GOVERNANCE, PROJECT_CONTEXT_DRIFT
4. Cycle — one feature's lifecycle

Define → Sync → Implement → Earn. You earn "done" only by passing every check.

One feature's lifecycle — Define → Sync → Implement → Earn, completion earned when all checks pass / auto-revert on failure

Multi-Agent — separating the builder from the verifier

The agents that build are kept separate from the agents that verify, so no agent can sign off on its own work. 0.6.0's blind-author goes one step further — the agent that writes the tests has no tool to read the implementation at all (no Read/Grep granted). "Wrote it without looking at the implementation" becomes a structural fact, not a promise. This separation maps directly onto regulatory · audit regimes (EU AI Act · SOX).

Persona privilege separation — orchestrator dispatches, planner/developer/reviewer act, blind-author is the test writer who can't see the implementation, observability watches

Ecosystem

cladding sits at the junction of three existing categories.

Ecosystem Venn — cladding at the junction of SDD · Runners · Multi-agent governance
How it differs from the neighbors
  • Spec Kit · OpenSpec · Tessl · Kiro — tools that help you write a good spec. On top of that, cladding keeps continuously cross-checking, inside the dev loop, that the spec and the actual code don't drift — at completion time · commit · all the way through CI.
  • BMAD · ChatDev · Claude Code Agent Teams — systems for splitting roles across multiple AI agents. cladding's agent division of labor runs with spec · gate · audit record combined on top.
  • tdd-guard — a tool that forces the AI to write tests first. The Unit · Coverage · oracle stages among cladding's 15 do the same job, more structurally.
  • OpenHands · Cline · Aider · Gooserunners that make the AI write code. cladding is the upper layer that verifies and governs the code those runners produce.

cladding's distinction is the combination — binding the core of the categories above into one verification loop.

Install

Two steps — install the infrastructure → create the project spec.

Step 1 — Install the infrastructure (npm)
npm install -g cladding   # install the cladding CLI
cd <project>              # move into the project
clad setup                # auto-wire your AI tools (Claude / Codex / Gemini / Cursor)

A single clad setup auto-detects the AI tools you have installed and wires them all — no per-tool configuration needed.

Where clad setup connects (4 hosts · 5 wire points)
Host (when detected) Wired location Auto-activation
Claude Code (~/.claude/) ~/.claude/plugins/cladding claude plugin marketplace add + install
Codex CLI skills (~/.agents/) ~/.agents/skills/cladding-* (auto on Codex restart)
Codex CLI MCP server (~/.codex/) [mcp_servers.cladding] in ~/.codex/config.toml (TOML entry itself)
Gemini CLI (~/.gemini/) ~/.gemini/extensions/cladding gemini extensions link
Cursor (~/.cursor/) mcpServers.cladding in ~/.cursor/mcp.json (JSON entry itself)

clad setup invokes each host's activation command automatically when the claude / gemini binaries are on PATH. Safe to re-run after an upgrade or after installing a new AI tool.

Verification level (honesty note): Claude Code is fully verified through real-usage campaigns (including real-time intervention). Codex · Gemini CLI have automated wiring + basic behavior confirmed. Cursor wires automatically, but real-usage verification is still pending — to be updated as it lands.

About the MCP server. All 4 hosts wire cladding as an MCP server — only the wire location differs. MCP is not something you invoke directly — no /mcp slash, no manual connect step. The AI in each host calls cladding's tools on its own in response to natural-language requests; you only type /cladding:init once and chat normally.

Step 2 — Init (create the project spec)

From the project directory, call it once inside your AI tool:

[inside your AI tool] /cladding:init "B2B payment SaaS"

The project's spec.yaml and supporting docs are created — once per project.

To raise enforcement: clad init --with-hook (install pre-commit + pre-push git hooks) · clad init --with-ci (scaffold the CI gate — true enforcement lives in CI).

Three init scenarios
Starting point Command What happens
An idea, nothing else /cladding:init "I'm going to build a B2B payment SaaS" LLM analyzes the domain → spec · docs · policies generated + 2–3 follow-up questions
A planning doc /cladding:init docs/plan.md recognizes the file path → loads the contents automatically and uses them as intent
Adopting into an existing project /cladding:init "apply cladding to this project" auto-scans the existing code → observed patterns merged with the intent
Init once, that's it

Init once and you're done — after that, just develop as usual. cladding runs the before/after loop in the background, so there are no commands to memorize.

Upgrading
npm update -g cladding     # 1. install the new version
cd <your project>          # 2. once per project
clad update                # 3. bring it in line with the new version

Your code · spec.yaml · docs are left untouched, so it's safe — and if the newer version is stricter and has something to flag, it just points it out (it won't block or fix anything).

Status

version
v0.6.3
2026-06
conformance
L4
tests
1500/1500
all pass
gate
15 stages
37 detectors
features
178
176 done · self-spec'd

134 test files · coverage drop blocked by the COVERAGE_DROP detector · single install path via npm (npm install -g cladding)

Road to Ironclad 1.0 — 1.0 locks only when two independent implementations pass the L4 conformance fixtures (GOVERNANCE § 1). cladding is the first.

Docs

License

MIT. LICENSE · Related: Ironclad (the standard cladding implements) · harness-boot (the seed).

Keywords