VisionClaw
A personal assistant agent that runs on your desktop (macOS). It receives command messages from pre-configured channels (Gmail, Telegram, Discord) and executes tasks autonomously using desktop control and browser automation. Results are sent back through the same channel.
Platform note: Desktop automation (mouse, keyboard, screenshots) requires macOS with
cliclickinstalled. Browser automation and channel integrations work cross-platform, but full computer-use features are macOS-only.
Features
- Autonomous Desktop Agent: Runs continuously as a long-running process on your computer
- Gmail Identity: The agent has its own Gmail account for email and Google Calendar
- Multi-Channel Support: Receives commands via Gmail, Telegram, Discord
- Desktop Control: Takes screenshots, controls mouse/keyboard, runs terminal commands
- Browser Automation: Navigate and interact with web pages via Playwright
- Google Calendar: Manages its own schedule for recurring tasks and reminders
- Fast Responder: Auto-acknowledges messages while the agent is busy working on a task
- Self-Improving: Can add new skills to itself and upgrade to new versions
- Runtime Observability: Built-in HTTP obs page for live logs while the agent is running
Architecture
VisionClaw is built on the Claude Agent SDK V2. It runs as a single-threaded agent with a wake/sleep loop triggered by incoming messages or a periodic heartbeat.
src/
index.ts # CLI entry point
logger.ts # Structured logger
onboarding/ # Interactive setup wizard (Gmail, OAuth, channels)
agent/ # Core agent loop, session, context, fast-responder
tools/ # Custom tools (notify, browser, calendar, screenshot, etc.)
channels/ # Channel adapters (Gmail, Telegram, Discord)
email/ # Gmail email tool implementation
calendar/ # Google Calendar integration
memory/ # Persistent memory store
skills/ # Skill installation logic
config/ # Configuration management
obs/ # Runtime observability HTTP server
Prerequisites
- Node.js >= 24.12.0
- An Anthropic API key
- A dedicated Gmail account for the agent
- Google OAuth access, either through the built-in VisionClaw Google app or your own Google Cloud OAuth2 credentials
Setup
# Install dependencies
pnpm install
# Build
pnpm run build
# Run (starts onboarding if not configured)
pnpm start
# Or run in development mode
pnpm run dev
# Reconfigure an existing profile (add/remove channels, rotate keys)
visionclaw reconfigure --profile defaultUtility scripts
Upload a file to Volcengine TOS
This repo includes a helper script that uploads a local file to TOS using environment variables for credentials/config.
export TOS_ACCESS_KEY_ID="..."
export TOS_ACCESS_KEY_SECRET="..."
export TOS_REGION="cn-beijing"
export TOS_BUCKET="your-bucket"
export TOS_ENDPOINT="tos-cn-beijing.volces.com" # optional
npx tsx scripts/upload-to-tos.ts --file ./local.bin --key uploads/local.binThe first run triggers an interactive onboarding wizard that will:
- Ask for your Anthropic API key
- Ask for a dedicated Gmail address for the agent
- Walk through Google OAuth2 authorization (Gmail + Calendar scopes), using the default VisionClaw Google app when available or your own Google Cloud credentials
- Optionally configure Telegram and Discord
The bundled default Google app, if you want one in your build, is configured in src/google/default-oauth-app.ts.
Configuration is stored per profile at ~/.visionclaw/profiles/<profile>/config.json.
Observability (HTTP)
When the agent is running, it serves a local observability page showing live logs.
- URL:
http://127.0.0.1:3101/obs - SSE stream:
GET /obs/events - Snapshot:
GET /obs/snapshot
This is controlled via advanced config (not asked during onboarding):
{
"obs": {
"enabled": true,
"host": "127.0.0.1",
"port": 3101,
"bufferSize": 1000
}
}Channels
| Channel | Requirements | Status |
|---|---|---|
| Gmail | Gmail account (required) | Always on |
| Telegram | Bot token from @BotFather | Optional |
| Discord | Bot token + channel allowlist | Optional |
Custom Tools
| Tool | Description |
|---|---|
wait |
Pause execution for a specified duration |
notify_user |
Send a message back through a channel (text + optional attachments) |
finish |
Signal task completion, return to sleep |
computer_use_screenshot |
Capture desktop screenshot |
browser |
Open a Chrome instance with CDP for Playwright automation |
manage_email |
List, search, read, send, reply, and manage Gmail messages |
manage_calendar |
Manage Google Calendar events |
manage_skills |
Install, list, create, and delete skills |
memory |
Persistent memory storage across wake cycles |
upgrade |
Check for and install updates |
computer_use_click |
Click on a UI element described in natural language |
computer_use_type |
Type text into the focused field |
computer_use_key |
Press a key or key combination |
computer_use_scroll |
Scroll at a target location |
computer_use_drag |
Drag from one element to another |
Development
See the Development Environment Setup Guide for a comprehensive walkthrough of setting up a local dev environment, including system dependencies, external accounts, environment variables, testing, and macOS permissions.