0.31.29 • Published yesterdayCLI

@luckydraw/cumulus

Licence

MIT

Version

0.31.29

Deps

Size

2.6 MB

Vulns

Weekly

1.7K

Summary Dependency Versions

Cumulus

A self-hosted multi-channel AI gateway built around Claude and other LLMs. Runs as a long-lived daemon, speaks to you through a web chat widget, Slack, Discord, iOS push, email webhooks, and more — with unlimited conversation context via the Recursive Language Model (RLM) pattern.

Originally a CLI wrapper for Claude, Cumulus has grown into a full gateway platform that coordinates agents, channels, and models behind one persistent process.

What you get

Gateway daemon (cumulus-gateway) — HTTP + WebSocket server with per-thread conversations, streaming responses, and an admin API.
Web chat widget — embeddable /chat interface with voice mode, push notifications, file uploads with progress, and rich blex block rendering (tables, forms, charts, kanban, diagrams).
Channel adapters — Slack and Discord bots, inbound email webhooks (Resend), and generic HTTP webhooks — all injecting into the same thread model.
Inter-agent messaging — threads can talk to each other via send_to_agent, with support for CC/BCC visibility.
Per-thread model selection — Claude (via CLI) or any HuggingFace model (GLM-5, Kimi-K2.5, Qwen3, etc.) with tool calling.
Scheduled triggers, email, push, media serving — built-in MCP tools so agents can send emails, schedule themselves, notify you, and upload files.
Unlimited history — JSONL per thread + vector store + adaptive context budget. Conversations never truncate.
Classic CLI (cumulus) — terminal chat for individual threads, backed by the same history store.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Clients                                                     │
│                                                              │
│   /chat (web)   Slack   Discord   Email   CLI   Push (PWA)   │
│        │         │       │        │       │      │           │
│        └─────────┴───────┴────────┴───────┴──────┘           │
│                          │                                   │
│                          ▼                                   │
│              ┌──────────────────────┐                        │
│              │  cumulus-gateway     │                        │
│              │  (HTTP / WS daemon)  │                        │
│              └──────────┬───────────┘                        │
│                         │                                    │
│           ┌─────────────┴─────────────┐                      │
│           ▼                           ▼                      │
│      ┌─────────┐              ┌──────────────┐               │
│      │ Thread  │              │ Model router │               │
│      │ store   │              │ Claude / HF  │               │
│      │ (JSONL) │              │ MCP tools    │               │
│      └────┬────┘              └──────┬───────┘               │
│           │                          │                       │
│           ▼                          ▼                       │
│       ~/.cumulus/                Claude CLI                  │
│       threads/                   HuggingFace API             │
│       content/                   MCP stdio + in-process      │
│       media/                                                 │
└──────────────────────────────────────────────────────────────┘

Every turn is a fresh model invocation. The gateway assembles a context budget from recent messages + RAG retrieval against the thread's history and content store, then streams the response back to the originating channel.

Installation

Requires Node 20+.

npm install -g @luckydraw/cumulus

This installs three binaries:

Command	Purpose
`cumulus`	Terminal chat client for a single thread
`cumulus-mcp`	MCP server exposing history/content tools (stdio)
`cumulus-gateway`	Long-running daemon (HTTP + WebSocket + adapters)

Quick start — gateway

# Interactive setup: detects project directories, installs a service, generates keys.
cumulus-gateway setup

# Or non-interactive:
cumulus-gateway setup --project-root ~/projects --port 8080

# Start / stop / reload (if you skip the service install):
cumulus-gateway start
cumulus-gateway stop
cumulus-gateway reload     # SIGHUP — drains active streams before restart

Setup writes ~/.cumulus/gateway.config.json, generates VAPID keys for push, scaffolds a systemd (Linux) or LaunchAgent (macOS) unit, and prints the generated API key.

Open http://localhost:8080/chat, paste the API key, and start talking. Messages hit your thread; responses stream back token-by-token.

Configuration

~/.cumulus/gateway.config.json — adjust any field with cumulus-gateway config set <key> <value> or edit directly:

{
  "apiKeys": ["sk-cumulus-…"],
  "port": 8080,
  "projectRoot": "/home/you/projects",
  "model": "claude", // default per-thread model
  "models": [
    // available models for thread picker
    { "id": "claude", "label": "Claude (CLI)", "provider": "claude-cli" },
    { "id": "zai-org/GLM-5", "label": "GLM-5", "provider": "huggingface" },
    { "id": "moonshotai/Kimi-K2.5", "label": "Kimi-K2.5", "provider": "huggingface" },
  ],
  "hfApiKey": "hf_…", // optional, for HuggingFace models
  "channels": {
    "slack": { "token": "xoxb-…", "signingSecret": "…", "appToken": "xapp-…" },
    "discord": { "token": "…", "clientId": "…" },
  },
  "resend": { "apiKey": "re_…", "defaultFrom": "you@example.com" },
  "vapid": { "publicKey": "…", "privateKey": "…", "subject": "mailto:you@example.com" },
}

Reload the daemon (cumulus-gateway reload) after editing. It waits for active streams to finish before restarting, so in-flight responses aren't dropped.

Gateway features

Per-thread model selection

Each thread can run on a different model. Use the dropdown in the widget header, or the REST API:

curl -X PUT http://localhost:8080/api/thread/my-thread/config \
  -H "X-API-Key: sk-…" \
  -d '{"model": "zai-org/GLM-5"}'

claude — spawns claude --print per turn. Gets the full Claude Code tool surface. Per-thread effort selector (low → max) maps to the CLI's --effort flag.
HuggingFace models — routed through an OpenAI-compatible endpoint with a built-in agentic loop that handles tool use, truncation recovery, and error retry.

At /chat. Features:

Streaming responses over WebSocket, with interjection support (type while streaming to interrupt and redirect).
Multiple threads, side by side — Cmd/Ctrl+Click any thread in the sidebar to open it in a second panel alongside the current one. Useful for cross-referencing or driving two agents in parallel. Mobile auto-collapses to a single panel.
Inline annotations — highlight any chat text, leave a comment via the popover, and send it back as a quoted chip. Chips can be edited or removed before sending. Works like leaving a margin note on what the agent just said.
Blex blocks — ~~~blex:TYPE fenced JSON renders as a rich, interactive component. 22 block types including:
- Interactive input — poll (multi-question carousels, multi-select, write-in answers), confirm (Yes/No/Cancel), form (typed fields with validation). User responses serialize back into the chat input.
- Embedded content — embed (sandboxed iframe for hosted apps and webpages, inline in the chat), image/gallery (with upload_media-served URLs), mermaid and svg diagrams.
- Live data — table (sortable/selectable), chart, kanban, calendar, timeline, status, metric, progress, file-tree, terminal, code, diff (with Apply/Reject buttons), layout (composes other blocks), branch (step-through flowcharts).
Voice mode — hands-free conversation using browser STT + server-side Piper TTS, with sentence-by-sentence playback and barge-in.
Push notifications — PWA install + VAPID subscriptions. Agents call notify_user to alert you while you're away.
File attachments — drag or pick any file type. Non-image files upload via XHR with a per-chip progress bar and cancel; agents receive the absolute disk path and can read_file it directly. Images stay on the inline-base64 path for vision-capable models.
Texitool integration — edit Unicode-art diagrams in-place via an embedded canvas.
Update banner — auto-detects when a newer version is on npm and offers a one-click update.

Channel adapters

Slack (channels.slack) — Socket Mode bot. Thread naming: slack-{userId}-{channelId}.
Discord (channels.discord) — Gateway WebSocket. Thread naming: discord-{userId}-{channelId}.
Inbound webhooks — POST /api/hooks/:type for email (Resend), forms, and generic events. Config-driven thread routing with HMAC signature verification.

Inter-agent messaging

Any thread can message another thread on the same gateway using the send_to_agent MCP tool:

send_to_agent(target="devops", message="Deploy the new build", visibility="cc")

cc (default) — all recipients see each other.
blind — each recipient thinks it's a direct message.
{hidden: […]} — selective (observer pattern, hidden agents invisible to visible recipients).

If the target is busy, the message is queued and delivered as a batch when that thread is idle ("while you were busy, 3 messages arrived…").

Scheduled triggers

Agents can schedule themselves:

schedule_trigger(at="2026-05-01T09:00:00Z", message="Follow up with lead")
schedule_trigger(cron="0 9 * * MON", message="Weekly check-in")
cancel_schedule(id="…")

Schedules are per-thread, persisted in {thread}.config.json, and fire as message injections into the thread.

Email (Resend)

With resend.apiKey configured:

send_email(to="person@example.com", subject="Hello", body="…")
list_emails(limit=10)

Rate-limited per thread (default 10/hour). All sends are logged to thread history. First email from a new thread triggers a notify_user ping.

Reliability

Graceful restart — cumulus-gateway reload (SIGHUP) drains active Claude/HF streams up to 120s before restarting; no truncated responses on deploy.
Auto-resume after restart — interrupted threads get a resume nudge on startup so the agent picks back up with full RAG context.
Persistent streaming buffer — partial responses are flushed to disk every 5s during streaming and recovered on restart.
Truncation continuation — finish_reason: "length" triggers max-token escalation (8k → 16k → 32k) and seamless continuation stitching.
WebSocket keepalive — server-side ping/pong every 30s; clients reload history if a stream goes silent for >120s.
Policy-error retry — Claude CLI transient "Usage Policy" refusals auto-retry up to 3 times with a visible "Retrying…" indicator.
HF transient-error retry — [Error: terminated], connection resets, and similar stream/network errors retry with exponential backoff.

Self-update

cumulus-gateway check-update      # compares running version to npm
cumulus-gateway update             # bumps to latest, saves previous for rollback
cumulus-gateway rollback           # restores the previous version

The widget's top bar also shows an "Update available" indicator (with a manual ↻ check button) when a new version lands on npm.

Classic CLI mode

The original RLM chat loop still works. Great for quick terminal work without running the gateway.

cumulus my-project             # open or create a thread
cumulus --list                 # list threads
cumulus --delete old-project

Each turn:

Append your message to ~/.cumulus/threads/my-project.jsonl.
Spawn claude --print with --mcp-config pointing to the cumulus MCP server.
Claude pulls whatever history it needs via search_history, peek_recent, etc.
Append the response to the JSONL.
Next turn starts from a fresh context.

MCP tools

The cumulus-mcp server exposes history and content tools. Usable from any MCP-compatible client.

History:

Tool	Purpose
`search_history`	Keyword / semantic / hybrid search over a thread
`peek_recent`	Last N messages
`read_messages`	Message range by index
`get_history_stats`	Count, token estimate, time range
`get_summary`	Auto-generated summaries (recent chunk, full, or specific range)
`sub_query`	Recursive sub-LLM call over retrieved messages

Content store (file reads, bash output, web fetches):

Tool	Purpose
`read_file`	Read text/PDF, chunk + embed + store
`store_content`	Store arbitrary text for later retrieval
`search_content`	Search across stored content
`retrieve_content`	Get full content by `[STORED:xxx]` id
`read_content_chunk`	Read a specific chunk index
`list_stored_content`	List all stored items
`detect_anomalies`	Find out-of-place content in a store
`forget_content`	Remove a stored item

Gateway-only tools (available to agents running inside the daemon):

send_to_agent, list_agents, notify_user, schedule_trigger, cancel_schedule, list_schedules, send_email, list_emails, upload_media.

RAG & context management

JSONL history per thread — every message, tool call, and tool result.
Content store — chunked file reads, embedded with local HuggingFace transformers, stored as binary Float32.
Segment summaries — LLM-generated per topic boundary, separately embedded for vocabulary-gap retrieval.
Adaptive context budget — self-tuning per thread based on TTFT. Shrinks when slow, grows when fast + near-capacity. Default 300k, floor 100k, ceiling 1M.
Query-type-aware retrieval — classifies queries (recall / synthesis / recent / decision) and adjusts scoring weights accordingly.

REST API (gateway)

Method	Path	Purpose
GET	`/health`	Gateway status
POST	`/api/thread/:name/message`	Send a message (SSE stream in response)
GET	`/api/thread/:name/history`	Paginated thread history
GET	`/api/thread/:name/config`	Thread config
PUT	`/api/thread/:name/config`	Update thread config (model, etc.)
DELETE	`/api/thread/:name`	Delete a thread
GET	`/api/threads`	List threads
GET	`/api/agents`	List threads + streaming status
POST	`/api/agents/inject`	Inject message into a thread
GET	`/api/models`	Available models
POST	`/api/media/upload`	Upload a file
GET	`/media/:filename`	Serve uploaded file
POST	`/api/hooks/:type`	Inbound webhook
GET	`/api/push/vapid-key`	Public VAPID key
POST	`/api/push/subscribe`	Register a push subscription
GET	`/api/version`	Running version + update availability
POST	`/api/admin/update`	Trigger self-update (admin key)

All /api/* routes require X-API-Key: <key> (from apiKeys[]).

WebSocket (/chat/ws) carries the same semantics with streaming, interjection, inject, and voice-mode audio frames.

Development

git clone https://github.com/soapko/cumulus
cd cumulus
npm install
npm run build
npm test            # vitest
npm run lint
npm run type-check

Build: dist/ compiled TypeScript plus static assets (widget HTML/CSS/JS, blex bundles).
Tests: vitest, 700+ tests covering agentic loop, retriever, adapters, scheduler, push.
Deploy workflow: bump version, npm publish, then cumulus-gateway reload on the host (SIGHUP drains active streams — see docs/tasks/050-graceful-restart.md).

Background

Cumulus implements the Recursive Language Model pattern: treat conversation history as an external environment the model queries programmatically, rather than stuffing everything into context. This enables reasoning over contexts 2+ orders of magnitude beyond the model's window, with graceful cost scaling.

See docs/ for task documents, ADRs, and implementation notes.

License

MIT

Keywords

claude cli chat rlm context mcp