4.8.1 • Published 10h agoCLI

crawlforge-mcp-server

Licence

MIT

Version

4.8.1

Deps

Size

1.7 MB

Vulns

Weekly

419

Install scriptsThis package runs scripts during installation (preinstall/install/postinstall)

Summary Dependency Versions

26 web scraping, crawling, deep-research & autonomous-extraction tools for Claude, Cursor & any MCP client.
Clean Markdown & structured JSON from any site. Get started with 1,000 free credits — no credit card required.

Star us on GitHub to follow along — it genuinely helps others discover the project.

Why CrawlForge?
CrawlForge vs. alternatives
Quick Start (2 Minutes)
Available Tools
Pricing
Advanced Configuration
Usage Examples
Security & Privacy
Support
Contributing

Why CrawlForge?

26 MCP-native tools — scraping, crawling, search, deep research, an autonomous agent, a unified multi-format scrape, document processing, stealth browsing, and more, callable directly from your AI assistant.
Generous free tier — 1,000 credits to start instantly, no credit card. Credits never expire and roll over month-to-month.
Local-LLM by default — extract_with_llm runs against a local Ollama model out of the box: no LLM API key, no per-token cost, and your data never leaves your machine. Cloud (OpenAI/Anthropic) is opt-in.
LLM-ready output — clean Markdown, structured JSON (schema-driven), screenshots, links, and metadata from a single fetch.
Autonomous agent — describe what you need in natural language; it plans, gathers, and shapes an answer under orchestrator-enforced hard stops (max steps/URLs/wall-clock) — no URLs required.
Security-hardened — SSRF protection on every request, a fail-closed backend allow-list, a vetted action allowlist for browser automation, and per-tool credit gating.
Works everywhere MCP does — Claude Desktop, Claude Code, Cursor, and any other MCP-enabled client, configured in one command.

CrawlForge vs. alternatives

	CrawlForge MCP	Firecrawl	Raw scraping API
Native MCP server	26 tools
Free tier	1,000 credits, rollover	Limited	Varies
Self-hosted / local LLM extraction (Ollama)	default, $0/token
Autonomous agent (no URLs needed)	`agent`
Deep research with source verification	`deep_research`	Partial
Browser automation / actions	`scrape_with_actions`		Varies
Stealth / anti-detection engines	Chromium + Camoufox		Add-on
Pre-built site templates	10 sites
License	MIT	AGPL-3.0	Proprietary

Comparison reflects publicly documented capabilities at time of writing. CrawlForge is MIT-licensed and MCP-first — built to plug straight into AI coding assistants.

Quick Start (2 Minutes)

1. Install from NPM

npm install -g crawlforge-mcp-server

2. Setup Your API Key (required)

Every tool requires a CrawlForge API key — new accounts get 1,000 free trial credits to start:

npx crawlforge-setup

This will:

Guide you through getting your free API key
Configure your credentials securely
Auto-configure Claude Code and Cursor (if installed)
Verify your setup is working

Don't have an API key? Get one free at https://www.crawlforge.dev/signup

One-step setup (v4.6.0+): crawlforge init detects your API key, installs the agent skill, and idempotently merges the MCP config stanza into Claude Code, Claude Desktop, and Cursor. Use crawlforge init --all --yes to configure every detected client non-interactively.

3. Configure Your IDE (if not auto-configured)

For Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "crawlforge": {
      "command": "npx",
      "args": ["-y", "crawlforge-mcp-server"]
    }
  }
}

Location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Restart Claude Desktop to activate.

For Claude Code CLI (Auto-configured)

The setup wizard automatically configures Claude Code by adding to ~/.claude.json:

{
  "mcpServers": {
    "crawlforge": {
      "type": "stdio",
      "command": "crawlforge-mcp"
    }
  }
}

After setup, restart Claude Code to activate.

For Cursor IDE (Auto-configured)

The setup wizard automatically configures Cursor by adding to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "crawlforge": {
      "type": "stdio",
      "command": "crawlforge-mcp"
    }
  }
}

Restart Cursor to activate.

Which launch command? npx -y crawlforge-mcp-server needs no global install and always runs the published version (recommended for Claude Desktop). For a global install (npm i -g crawlforge-mcp-server), use the dedicated crawlforge-mcp bin — it resolves on your PATH, so it survives Node/nvm version switches. The bare crawlforge command still launches the server when an MCP client spawns it over stdio (backward compatibility for configs created before v4.2.5); interactively it's the CLI — run crawlforge mcp to start the server by hand.

Available Tools

CrawlForge requires a CrawlForge API key — every tool is metered and consumes credits. New accounts get 1,000 free trial credits to start. Get a key at crawlforge.dev/signup.

All Tools (API key required)

Tool	Credits	What it does
`fetch_url`	1	Fetch content from any URL
`extract_text`	1	Extract clean text from web pages
`extract_links`	1	Get all links from a page
`extract_metadata`	1	Extract page metadata (title, OG tags, schema.org)
`scrape_template`	1	Structured data from well-known sites (Amazon, GitHub, LinkedIn, YouTube, Reddit, Hacker News, npm, and more) without writing selectors
`list_ollama_models`	1	List the Ollama models installed locally (helps you pick a `model` for `extract_with_llm`)
`get_batch_results`	1	Retrieve paginated results for a `batch_scrape` job by `batchId`
`scrape`	2	Unified single-fetch, multi-format extraction. Pass a `formats` array (markdown/html/rawHtml/text/links/metadata/screenshot/json-schema) plus `onlyMainContent`; one fetch serves every requested format with per-format partial-success warnings
`scrape_structured`	2	Extract structured data with CSS selectors
`extract_content`	2	Enhanced content extraction
`map_site`	2	Discover and map website structure (optional `search=` ranks the discovered URLs)
`process_document`	2	Multi-format document processing
`localization`	2	Multi-language and geo-location management
`track_changes`	3	Monitor content changes over time
`analyze_content`	3	Comprehensive content analysis
`extract_structured`	3	LLM-powered schema-driven extraction (your own LLM key or local Ollama)
`extract_with_llm`	3	Natural-language extraction. Defaults to a local Ollama model; pass `provider: "openai" \| "anthropic"` with the matching key for cloud models (external LLM billed by your provider)
`summarize_content`	4	Generate intelligent summaries
`crawl_deep`	4	Deep crawl entire websites
`search_web`	5	Search the web using Google Search API
`batch_scrape`	5	Process multiple URLs simultaneously
`scrape_with_actions`	5	Browser automation chains
`generate_llms_txt`	5	Generate AI interaction guidelines
`stealth_mode`	5	Anti-detection browser management
`agent`	8	Autonomous research/extraction from a natural-language prompt — no URLs required. Plans, gathers, and shapes an answer under hard safety stops (max steps/URLs/wall-clock enforced by the orchestrator, never the LLM)
`deep_research`	10	Multi-stage research with source verification

For the full canonical capabilities reference (all tools, CLI commands, stealth engines, research workflow), see SKILL.md.

↑ Back to top

Pricing

Every tool is metered and requires an API key. New accounts get 1,000 free trial credits — no credit card required to start.

Plan	Credits/Month	Best For
Free	1,000	Testing & personal projects
Hobby ($19)	5,000	Small projects & development
Professional ($99)	50,000	Professional use & production
Business ($399)	250,000	Large scale operations

All plans include:

Access to all 26 tools
Credits never expire and roll over month-to-month
API access and webhook notifications

View full pricing

Advanced Configuration

Environment Variables

# Optional: Set API key via environment
export CRAWLFORGE_API_KEY="cf_live_your_api_key_here"

# Optional: Custom API endpoint (for enterprise)
export CRAWLFORGE_API_URL="https://api.crawlforge.dev"
# As of v3.0.18, this variable is validated against an allow-list of CrawlForge backend hosts.

# Optional: Local LLM (Ollama) overrides — extract_with_llm defaults to Ollama
export OLLAMA_BASE_URL="http://localhost:11434"   # default
export OLLAMA_DEFAULT_MODEL="llama3.2"             # default; any locally-pulled model name works

# Optional: Cloud LLM keys — only needed when you pass provider: "openai" or "anthropic"
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Optional: deep_research stealth extraction fallback (v4.6.6) — see below
export RESEARCH_STEALTH_ENGINE="auto"      # auto (default) | camoufox | chromium
export RESEARCH_STEALTH_FALLBACK="true"    # set to "false" to disable entirely
export RESEARCH_MAX_STEALTH_RETRIES="8"    # cap on stealth retries per research run

Local-LLM quickstart (`extract_with_llm` with Ollama)

extract_with_llm defaults to a local Ollama model — no LLM-provider key, no per-token LLM costs, and no data leaving your machine (the CrawlForge credit cost still applies).

# 1. Install Ollama:  https://ollama.com
# 2. Pull any model from https://ollama.com/library
ollama pull llama3.2

# 3. Discover what's installed (from your MCP client)
#    list_ollama_models()

# 4. Extract — defaults to Ollama with the model from step 2
#    extract_with_llm({ url: "https://example.com", prompt: "…", model: "llama3.2" })

Stealth extraction for `deep_research` (Camoufox)

deep_research automatically retries sources that block the normal fetch path (Reddit, Quora, forums, and Cloudflare/DataDome-protected pages return HTTP 403) through a real fingerprinted browser, then re-extracts from the rendered HTML. It's bounded (RESEARCH_MAX_STEALTH_RETRIES, default 8, plus a per-page timeout) and lazy — the browser stack only loads when a source is actually blocked.

Engine selection (RESEARCH_STEALTH_ENGINE):

auto (default) — prefer Camoufox (Firefox anti-detect), fall back to Chromium stealth, then plain fetch.
camoufox — force Camoufox.
chromium — force the Chromium stealth engine.

Headless Chromium cannot clear modern challenges (Cloudflare Turnstile, DataDome) — Camoufox can. In testing it recovered Quora and Trustpilot pages that were otherwise fully blocked. To enable it, install the optional dependency and run its one-time binary fetch:

# Camoufox is declared as an optional dependency, so a normal install already pulls it.
# If you installed with --no-optional, add it explicitly:
npm install camoufox

# One-time download of the Camoufox Firefox binary (~130 MB):
npx camoufox fetch

Without the Camoufox binary, deep_research silently falls back to Chromium stealth and then to plain fetch — no errors, just lower recovery on heavily-protected sites. Disable the whole fallback with RESEARCH_STEALTH_FALLBACK=false.

Note: Hard IP-reputation blocks (e.g. Reddit's edge 403) resist headless stealth from any IP and require residential/mobile proxies, which CrawlForge does not provide. See docs/stealth-engines.md for details.

Manual Configuration

Your configuration is stored at ~/.crawlforge/config.json:

{
  "apiKey": "cf_live_...",
  "userId": "user_...",
  "email": "you@example.com"
}

Usage Examples

Once configured, use these tools in your AI assistant:

"Search for the latest AI news"
"Extract all links from example.com"
"Crawl the documentation site and summarize it"
"Monitor this page for changes"
"Extract product prices from this e-commerce site"

Security & Privacy

Secure Authentication: API keys required for all metered tools
Local Storage: API keys stored securely at ~/.crawlforge/config.json
HTTPS Only: All connections use encrypted HTTPS
No Data Retention: We don't store scraped data, only usage logs
Rate Limiting: Built-in protection against abuse
Compliance: Respects robots.txt and GDPR requirements

Security & Approvals

SSRF enforcement: Every scraped URL is validated before the request is sent — http/https only; blocks loopback, RFC1918, IPv6 private/link-local ranges, cloud metadata endpoints (GCP, Azure), and dangerous ports (SSH, SMTP, DNS, MySQL, Postgres, Redis, MongoDB, etc.). Redirects are re-validated each hop, capped at 5.
Backend endpoint guard (v3.0.18): The server's own calls to CrawlForge.dev use a separate fail-closed allow-list ({crawlforge.dev, www.crawlforge.dev, api.crawlforge.dev}, HTTPS required). Setting CRAWLFORGE_API_URL to an arbitrary host is blocked at parse time.
Action allowlist: scrape_with_actions accepts only 7 action types (wait, click, type, press, scroll, screenshot, executeJavaScript). No download, file-write, or arbitrary cross-page navigation primitives exist.
JavaScript gate: The executeJavaScript action throws by default. Set ALLOW_JAVASCRIPT_EXECUTION=true at deploy time to enable (not recommended in production).
MCP Elicitation (v3.6.0): Four tools request user confirmation before executing expensive operations — deep_research (>50 URLs), batch_scrape (sync mode, >25 URLs), crawl_deep (projected >500 pages), extract_structured (schema has >3 required fields with no LLM configured). Credit-low situations also elicit. Confirmation is best-effort: if the MCP client does not support elicitation the tool proceeds (fail-open).
Per-tool credit gating: Every tool is wrapped with withAuth() and is metered — credits are checked and deducted before execution, and a valid API key is required for every tool (fail-closed since v3.0.18).