0.1.200 • Published 2d agoCLI

visionclaw

Licence

Apache-2.0

Version

0.1.200

Deps

Size

66.1 MB

Vulns

Weekly

9.9K

Install scriptsThis package runs scripts during installation (preinstall/install/postinstall)

Summary Dependency Versions

VisionClaw

A personal assistant agent that runs on your desktop (macOS). It receives command messages from pre-configured channels (Gmail, Telegram, Discord) and executes tasks autonomously using desktop control and browser automation. Results are sent back through the same channel.

Platform note: Desktop automation (mouse, keyboard, screenshots) requires macOS with cliclick installed. Browser automation and channel integrations work cross-platform, but full computer-use features are macOS-only.

Features

Autonomous Desktop Agent: Runs continuously as a long-running process on your computer
Gmail Identity: The agent has its own Gmail account for email and Google Calendar
Multi-Channel Support: Receives commands via Gmail, Telegram, Discord
Desktop Control: Takes screenshots, controls mouse/keyboard, runs terminal commands
Browser Automation: Navigate and interact with web pages via Playwright
Google Calendar: Manages its own schedule for recurring tasks and reminders
Fast Responder: Auto-acknowledges messages while the agent is busy working on a task
Self-Improving: Can add new skills to itself and upgrade to new versions
Runtime Observability: Built-in HTTP obs page for live logs while the agent is running

Architecture

VisionClaw is built on the Claude Agent SDK V2. It runs as a single-threaded agent with a wake/sleep loop triggered by incoming messages or a periodic heartbeat.

src/
  index.ts              # CLI entry point
  logger.ts             # Structured logger
  onboarding/           # Interactive setup wizard (Gmail, OAuth, channels)
  agent/                # Core agent loop, session, context, fast-responder
  tools/                # Custom tools (notify, browser, calendar, screenshot, etc.)
  channels/             # Channel adapters (Gmail, Telegram, Discord)
  email/                # Gmail email tool implementation
  calendar/             # Google Calendar integration
  memory/               # Persistent memory store
  skills/               # Skill installation logic
  config/               # Configuration management
  obs/                  # Runtime observability HTTP server

Prerequisites

Node.js >= 24.12.0
An Anthropic API key
A dedicated Gmail account for the agent
Google OAuth access, either through the built-in VisionClaw Google app or your own Google Cloud OAuth2 credentials

Setup

# Install dependencies
pnpm install

# Build
pnpm run build

# Run (starts onboarding if not configured)
pnpm start

# Or run in development mode
pnpm run dev

# Reconfigure an existing profile (add/remove channels, rotate keys)
visionclaw reconfigure --profile default

Utility scripts

Upload a file to Volcengine TOS

This repo includes a helper script that uploads a local file to TOS using environment variables for credentials/config.

export TOS_ACCESS_KEY_ID="..."
export TOS_ACCESS_KEY_SECRET="..."
export TOS_REGION="cn-beijing"
export TOS_BUCKET="your-bucket"
export TOS_ENDPOINT="tos-cn-beijing.volces.com" # optional

npx tsx scripts/upload-to-tos.ts --file ./local.bin --key uploads/local.bin

The first run triggers an interactive onboarding wizard that will:

Ask for your Anthropic API key
Ask for a dedicated Gmail address for the agent
Walk through Google OAuth2 authorization (Gmail + Calendar scopes), using the default VisionClaw Google app when available or your own Google Cloud credentials
Optionally configure Telegram and Discord

The bundled default Google app, if you want one in your build, is configured in src/google/default-oauth-app.ts.

Configuration is stored per profile at ~/.visionclaw/profiles/<profile>/config.json.

Observability (HTTP)

When the agent is running, it serves a local observability page showing live logs.

URL: http://127.0.0.1:3101/obs
SSE stream: GET /obs/events
Snapshot: GET /obs/snapshot

This is controlled via advanced config (not asked during onboarding):

{
  "obs": {
    "enabled": true,
    "host": "127.0.0.1",
    "port": 3101,
    "bufferSize": 1000
  }
}

Channels

Channel	Requirements	Status
Gmail	Gmail account (required)	Always on
Telegram	Bot token from @BotFather	Optional
Discord	Bot token + channel allowlist	Optional

Custom Tools

Tool	Description
`wait`	Pause execution for a specified duration
`notify_user`	Send a message back through a channel (text + optional attachments)
`finish`	Signal task completion, return to sleep
`computer_use_screenshot`	Capture desktop screenshot
`browser`	Open a Chrome instance with CDP for Playwright automation
`manage_email`	List, search, read, send, reply, and manage Gmail messages
`manage_calendar`	Manage Google Calendar events
`manage_skills`	Install, list, create, and delete skills
`memory`	Persistent memory storage across wake cycles
`upgrade`	Check for and install updates
`computer_use_click`	Click on a UI element described in natural language
`computer_use_type`	Type text into the focused field
`computer_use_key`	Press a key or key combination
`computer_use_scroll`	Scroll at a target location
`computer_use_drag`	Drag from one element to another