npm.io
0.1.200 • Published 2d agoCLI

visionclaw

Licence
Apache-2.0
Version
0.1.200
Deps
34
Size
66.1 MB
Vulns
0
Weekly
9.9K
Install scriptsThis package runs scripts during installation (preinstall/install/postinstall)

VisionClaw

A personal assistant agent that runs on your desktop (macOS). It receives command messages from pre-configured channels (Gmail, Telegram, Discord) and executes tasks autonomously using desktop control and browser automation. Results are sent back through the same channel.

Platform note: Desktop automation (mouse, keyboard, screenshots) requires macOS with cliclick installed. Browser automation and channel integrations work cross-platform, but full computer-use features are macOS-only.

Features

  • Autonomous Desktop Agent: Runs continuously as a long-running process on your computer
  • Gmail Identity: The agent has its own Gmail account for email and Google Calendar
  • Multi-Channel Support: Receives commands via Gmail, Telegram, Discord
  • Desktop Control: Takes screenshots, controls mouse/keyboard, runs terminal commands
  • Browser Automation: Navigate and interact with web pages via Playwright
  • Google Calendar: Manages its own schedule for recurring tasks and reminders
  • Fast Responder: Auto-acknowledges messages while the agent is busy working on a task
  • Self-Improving: Can add new skills to itself and upgrade to new versions
  • Runtime Observability: Built-in HTTP obs page for live logs while the agent is running

Architecture

VisionClaw is built on the Claude Agent SDK V2. It runs as a single-threaded agent with a wake/sleep loop triggered by incoming messages or a periodic heartbeat.

src/
  index.ts              # CLI entry point
  logger.ts             # Structured logger
  onboarding/           # Interactive setup wizard (Gmail, OAuth, channels)
  agent/                # Core agent loop, session, context, fast-responder
  tools/                # Custom tools (notify, browser, calendar, screenshot, etc.)
  channels/             # Channel adapters (Gmail, Telegram, Discord)
  email/                # Gmail email tool implementation
  calendar/             # Google Calendar integration
  memory/               # Persistent memory store
  skills/               # Skill installation logic
  config/               # Configuration management
  obs/                  # Runtime observability HTTP server

Prerequisites

  • Node.js >= 24.12.0
  • An Anthropic API key
  • A dedicated Gmail account for the agent
  • Google OAuth access, either through the built-in VisionClaw Google app or your own Google Cloud OAuth2 credentials

Setup

# Install dependencies
pnpm install

# Build
pnpm run build

# Run (starts onboarding if not configured)
pnpm start

# Or run in development mode
pnpm run dev

# Reconfigure an existing profile (add/remove channels, rotate keys)
visionclaw reconfigure --profile default

Utility scripts

Upload a file to Volcengine TOS

This repo includes a helper script that uploads a local file to TOS using environment variables for credentials/config.

export TOS_ACCESS_KEY_ID="..."
export TOS_ACCESS_KEY_SECRET="..."
export TOS_REGION="cn-beijing"
export TOS_BUCKET="your-bucket"
export TOS_ENDPOINT="tos-cn-beijing.volces.com" # optional

npx tsx scripts/upload-to-tos.ts --file ./local.bin --key uploads/local.bin

The first run triggers an interactive onboarding wizard that will:

  1. Ask for your Anthropic API key
  2. Ask for a dedicated Gmail address for the agent
  3. Walk through Google OAuth2 authorization (Gmail + Calendar scopes), using the default VisionClaw Google app when available or your own Google Cloud credentials
  4. Optionally configure Telegram and Discord

The bundled default Google app, if you want one in your build, is configured in src/google/default-oauth-app.ts.

Configuration is stored per profile at ~/.visionclaw/profiles/<profile>/config.json.

Observability (HTTP)

When the agent is running, it serves a local observability page showing live logs.

  • URL: http://127.0.0.1:3101/obs
  • SSE stream: GET /obs/events
  • Snapshot: GET /obs/snapshot

This is controlled via advanced config (not asked during onboarding):

{
  "obs": {
    "enabled": true,
    "host": "127.0.0.1",
    "port": 3101,
    "bufferSize": 1000
  }
}

Channels

Channel Requirements Status
Gmail Gmail account (required) Always on
Telegram Bot token from @BotFather Optional
Discord Bot token + channel allowlist Optional

Custom Tools

Tool Description
wait Pause execution for a specified duration
notify_user Send a message back through a channel (text + optional attachments)
finish Signal task completion, return to sleep
computer_use_screenshot Capture desktop screenshot
browser Open a Chrome instance with CDP for Playwright automation
manage_email List, search, read, send, reply, and manage Gmail messages
manage_calendar Manage Google Calendar events
manage_skills Install, list, create, and delete skills
memory Persistent memory storage across wake cycles
upgrade Check for and install updates
computer_use_click Click on a UI element described in natural language
computer_use_type Type text into the focused field
computer_use_key Press a key or key combination
computer_use_scroll Scroll at a target location
computer_use_drag Drag from one element to another

Development

See the Development Environment Setup Guide for a comprehensive walkthrough of setting up a local dev environment, including system dependencies, external accounts, environment variables, testing, and macOS permissions.

License

Apache License 2.0

Keywords