npm.io
0.31.29 • Published yesterdayCLI

@luckydraw/cumulus

Licence
MIT
Version
0.31.29
Deps
15
Size
2.6 MB
Vulns
0
Weekly
1.7K

Cumulus

A self-hosted multi-channel AI gateway built around Claude and other LLMs. Runs as a long-lived daemon, speaks to you through a web chat widget, Slack, Discord, iOS push, email webhooks, and more — with unlimited conversation context via the Recursive Language Model (RLM) pattern.

Originally a CLI wrapper for Claude, Cumulus has grown into a full gateway platform that coordinates agents, channels, and models behind one persistent process.

What you get

  • Gateway daemon (cumulus-gateway) — HTTP + WebSocket server with per-thread conversations, streaming responses, and an admin API.
  • Web chat widget — embeddable /chat interface with voice mode, push notifications, file uploads with progress, and rich blex block rendering (tables, forms, charts, kanban, diagrams).
  • Channel adapters — Slack and Discord bots, inbound email webhooks (Resend), and generic HTTP webhooks — all injecting into the same thread model.
  • Inter-agent messaging — threads can talk to each other via send_to_agent, with support for CC/BCC visibility.
  • Per-thread model selection — Claude (via CLI) or any HuggingFace model (GLM-5, Kimi-K2.5, Qwen3, etc.) with tool calling.
  • Scheduled triggers, email, push, media serving — built-in MCP tools so agents can send emails, schedule themselves, notify you, and upload files.
  • Unlimited history — JSONL per thread + vector store + adaptive context budget. Conversations never truncate.
  • Classic CLI (cumulus) — terminal chat for individual threads, backed by the same history store.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Clients                                                     │
│                                                              │
│   /chat (web)   Slack   Discord   Email   CLI   Push (PWA)   │
│        │         │       │        │       │      │           │
│        └─────────┴───────┴────────┴───────┴──────┘           │
│                          │                                   │
│                          ▼                                   │
│              ┌──────────────────────┐                        │
│              │  cumulus-gateway     │                        │
│              │  (HTTP / WS daemon)  │                        │
│              └──────────┬───────────┘                        │
│                         │                                    │
│           ┌─────────────┴─────────────┐                      │
│           ▼                           ▼                      │
│      ┌─────────┐              ┌──────────────┐               │
│      │ Thread  │              │ Model router │               │
│      │ store   │              │ Claude / HF  │               │
│      │ (JSONL) │              │ MCP tools    │               │
│      └────┬────┘              └──────┬───────┘               │
│           │                          │                       │
│           ▼                          ▼                       │
│       ~/.cumulus/                Claude CLI                  │
│       threads/                   HuggingFace API             │
│       content/                   MCP stdio + in-process      │
│       media/                                                 │
└──────────────────────────────────────────────────────────────┘

Every turn is a fresh model invocation. The gateway assembles a context budget from recent messages + RAG retrieval against the thread's history and content store, then streams the response back to the originating channel.

Installation

Requires Node 20+.

npm install -g @luckydraw/cumulus

This installs three binaries:

Command Purpose
cumulus Terminal chat client for a single thread
cumulus-mcp MCP server exposing history/content tools (stdio)
cumulus-gateway Long-running daemon (HTTP + WebSocket + adapters)

Quick start — gateway

# Interactive setup: detects project directories, installs a service, generates keys.
cumulus-gateway setup

# Or non-interactive:
cumulus-gateway setup --project-root ~/projects --port 8080

# Start / stop / reload (if you skip the service install):
cumulus-gateway start
cumulus-gateway stop
cumulus-gateway reload     # SIGHUP — drains active streams before restart

Setup writes ~/.cumulus/gateway.config.json, generates VAPID keys for push, scaffolds a systemd (Linux) or LaunchAgent (macOS) unit, and prints the generated API key.

Open http://localhost:8080/chat, paste the API key, and start talking. Messages hit your thread; responses stream back token-by-token.

Configuration

~/.cumulus/gateway.config.json — adjust any field with cumulus-gateway config set <key> <value> or edit directly:

{
  "apiKeys": ["sk-cumulus-…"],
  "port": 8080,
  "projectRoot": "/home/you/projects",
  "model": "claude", // default per-thread model
  "models": [
    // available models for thread picker
    { "id": "claude", "label": "Claude (CLI)", "provider": "claude-cli" },
    { "id": "zai-org/GLM-5", "label": "GLM-5", "provider": "huggingface" },
    { "id": "moonshotai/Kimi-K2.5", "label": "Kimi-K2.5", "provider": "huggingface" },
  ],
  "hfApiKey": "hf_…", // optional, for HuggingFace models
  "channels": {
    "slack": { "token": "xoxb-…", "signingSecret": "", "appToken": "xapp-…" },
    "discord": { "token": "", "clientId": "" },
  },
  "resend": { "apiKey": "re_…", "defaultFrom": "you@example.com" },
  "vapid": { "publicKey": "", "privateKey": "", "subject": "mailto:you@example.com" },
}

Reload the daemon (cumulus-gateway reload) after editing. It waits for active streams to finish before restarting, so in-flight responses aren't dropped.

Gateway features

Per-thread model selection

Each thread can run on a different model. Use the dropdown in the widget header, or the REST API:

curl -X PUT http://localhost:8080/api/thread/my-thread/config \
  -H "X-API-Key: sk-…" \
  -d '{"model": "zai-org/GLM-5"}'
  • claude — spawns claude --print per turn. Gets the full Claude Code tool surface. Per-thread effort selector (lowmax) maps to the CLI's --effort flag.
  • HuggingFace models — routed through an OpenAI-compatible endpoint with a built-in agentic loop that handles tool use, truncation recovery, and error retry.
Web chat widget

At /chat. Features:

  • Streaming responses over WebSocket, with interjection support (type while streaming to interrupt and redirect).
  • Multiple threads, side by side — Cmd/Ctrl+Click any thread in the sidebar to open it in a second panel alongside the current one. Useful for cross-referencing or driving two agents in parallel. Mobile auto-collapses to a single panel.
  • Inline annotations — highlight any chat text, leave a comment via the popover, and send it back as a quoted chip. Chips can be edited or removed before sending. Works like leaving a margin note on what the agent just said.
  • Blex blocks~~~blex:TYPE fenced JSON renders as a rich, interactive component. 22 block types including:
    • Interactive inputpoll (multi-question carousels, multi-select, write-in answers), confirm (Yes/No/Cancel), form (typed fields with validation). User responses serialize back into the chat input.
    • Embedded contentembed (sandboxed iframe for hosted apps and webpages, inline in the chat), image/gallery (with upload_media-served URLs), mermaid and svg diagrams.
    • Live datatable (sortable/selectable), chart, kanban, calendar, timeline, status, metric, progress, file-tree, terminal, code, diff (with Apply/Reject buttons), layout (composes other blocks), branch (step-through flowcharts).
  • Voice mode — hands-free conversation using browser STT + server-side Piper TTS, with sentence-by-sentence playback and barge-in.
  • Push notifications — PWA install + VAPID subscriptions. Agents call notify_user to alert you while you're away.
  • File attachments — drag or pick any file type. Non-image files upload via XHR with a per-chip progress bar and cancel; agents receive the absolute disk path and can read_file it directly. Images stay on the inline-base64 path for vision-capable models.
  • Texitool integration — edit Unicode-art diagrams in-place via an embedded canvas.
  • Update banner — auto-detects when a newer version is on npm and offers a one-click update.
Channel adapters
  • Slack (channels.slack) — Socket Mode bot. Thread naming: slack-{userId}-{channelId}.
  • Discord (channels.discord) — Gateway WebSocket. Thread naming: discord-{userId}-{channelId}.
  • Inbound webhooksPOST /api/hooks/:type for email (Resend), forms, and generic events. Config-driven thread routing with HMAC signature verification.
Inter-agent messaging

Any thread can message another thread on the same gateway using the send_to_agent MCP tool:

send_to_agent(target="devops", message="Deploy the new build", visibility="cc")
  • cc (default) — all recipients see each other.
  • blind — each recipient thinks it's a direct message.
  • {hidden: […]} — selective (observer pattern, hidden agents invisible to visible recipients).

If the target is busy, the message is queued and delivered as a batch when that thread is idle ("while you were busy, 3 messages arrived…").

Scheduled triggers

Agents can schedule themselves:

schedule_trigger(at="2026-05-01T09:00:00Z", message="Follow up with lead")
schedule_trigger(cron="0 9 * * MON", message="Weekly check-in")
cancel_schedule(id="…")

Schedules are per-thread, persisted in {thread}.config.json, and fire as message injections into the thread.

Email (Resend)

With resend.apiKey configured:

send_email(to="person@example.com", subject="Hello", body="…")
list_emails(limit=10)

Rate-limited per thread (default 10/hour). All sends are logged to thread history. First email from a new thread triggers a notify_user ping.

Reliability
  • Graceful restartcumulus-gateway reload (SIGHUP) drains active Claude/HF streams up to 120s before restarting; no truncated responses on deploy.
  • Auto-resume after restart — interrupted threads get a resume nudge on startup so the agent picks back up with full RAG context.
  • Persistent streaming buffer — partial responses are flushed to disk every 5s during streaming and recovered on restart.
  • Truncation continuationfinish_reason: "length" triggers max-token escalation (8k → 16k → 32k) and seamless continuation stitching.
  • WebSocket keepalive — server-side ping/pong every 30s; clients reload history if a stream goes silent for >120s.
  • Policy-error retry — Claude CLI transient "Usage Policy" refusals auto-retry up to 3 times with a visible "Retrying…" indicator.
  • HF transient-error retry[Error: terminated], connection resets, and similar stream/network errors retry with exponential backoff.
Self-update
cumulus-gateway check-update      # compares running version to npm
cumulus-gateway update             # bumps to latest, saves previous for rollback
cumulus-gateway rollback           # restores the previous version

The widget's top bar also shows an "Update available" indicator (with a manual ↻ check button) when a new version lands on npm.

Classic CLI mode

The original RLM chat loop still works. Great for quick terminal work without running the gateway.

cumulus my-project             # open or create a thread
cumulus --list                 # list threads
cumulus --delete old-project

Each turn:

  1. Append your message to ~/.cumulus/threads/my-project.jsonl.
  2. Spawn claude --print with --mcp-config pointing to the cumulus MCP server.
  3. Claude pulls whatever history it needs via search_history, peek_recent, etc.
  4. Append the response to the JSONL.
  5. Next turn starts from a fresh context.

MCP tools

The cumulus-mcp server exposes history and content tools. Usable from any MCP-compatible client.

History:

Tool Purpose
search_history Keyword / semantic / hybrid search over a thread
peek_recent Last N messages
read_messages Message range by index
get_history_stats Count, token estimate, time range
get_summary Auto-generated summaries (recent chunk, full, or specific range)
sub_query Recursive sub-LLM call over retrieved messages

Content store (file reads, bash output, web fetches):

Tool Purpose
read_file Read text/PDF, chunk + embed + store
store_content Store arbitrary text for later retrieval
search_content Search across stored content
retrieve_content Get full content by [STORED:xxx] id
read_content_chunk Read a specific chunk index
list_stored_content List all stored items
detect_anomalies Find out-of-place content in a store
forget_content Remove a stored item

Gateway-only tools (available to agents running inside the daemon):

send_to_agent, list_agents, notify_user, schedule_trigger, cancel_schedule, list_schedules, send_email, list_emails, upload_media.

RAG & context management

  • JSONL history per thread — every message, tool call, and tool result.
  • Content store — chunked file reads, embedded with local HuggingFace transformers, stored as binary Float32.
  • Segment summaries — LLM-generated per topic boundary, separately embedded for vocabulary-gap retrieval.
  • Adaptive context budget — self-tuning per thread based on TTFT. Shrinks when slow, grows when fast + near-capacity. Default 300k, floor 100k, ceiling 1M.
  • Query-type-aware retrieval — classifies queries (recall / synthesis / recent / decision) and adjusts scoring weights accordingly.

REST API (gateway)

Method Path Purpose
GET /health Gateway status
POST /api/thread/:name/message Send a message (SSE stream in response)
GET /api/thread/:name/history Paginated thread history
GET /api/thread/:name/config Thread config
PUT /api/thread/:name/config Update thread config (model, etc.)
DELETE /api/thread/:name Delete a thread
GET /api/threads List threads
GET /api/agents List threads + streaming status
POST /api/agents/inject Inject message into a thread
GET /api/models Available models
POST /api/media/upload Upload a file
GET /media/:filename Serve uploaded file
POST /api/hooks/:type Inbound webhook
GET /api/push/vapid-key Public VAPID key
POST /api/push/subscribe Register a push subscription
GET /api/version Running version + update availability
POST /api/admin/update Trigger self-update (admin key)

All /api/* routes require X-API-Key: <key> (from apiKeys[]).

WebSocket (/chat/ws) carries the same semantics with streaming, interjection, inject, and voice-mode audio frames.

Development

git clone https://github.com/soapko/cumulus
cd cumulus
npm install
npm run build
npm test            # vitest
npm run lint
npm run type-check
  • Build: dist/ compiled TypeScript plus static assets (widget HTML/CSS/JS, blex bundles).
  • Tests: vitest, 700+ tests covering agentic loop, retriever, adapters, scheduler, push.
  • Deploy workflow: bump version, npm publish, then cumulus-gateway reload on the host (SIGHUP drains active streams — see docs/tasks/050-graceful-restart.md).

Background

Cumulus implements the Recursive Language Model pattern: treat conversation history as an external environment the model queries programmatically, rather than stuffing everything into context. This enables reasoning over contexts 2+ orders of magnitude beyond the model's window, with graceful cost scaling.

See docs/ for task documents, ADRs, and implementation notes.

License

MIT

Keywords