Rawfy
Universal web perception skill for AI agents — converts any URL into structured, agent-readable content with described images, transcribed video, and interactive element maps.
Rawfy is not an AI agent. Rawfy is a tool that AI agents call to see the web clearly.
The Problem
AI Agents struggle to read the web.
- Passing raw HTML to an LLM destroys its context window.
- Simple scraping tools miss client-side rendered Single Page Apps (SPAs).
- Important information hidden in images, YouTube embeds, or PDFs is completely lost.
- Interactive elements (forms, buttons, inputs) are stripped away, making it impossible for agents to navigate.
The Solution
Rawfy acts as the perception layer between the wild web and your AI agent. It fetches a page, automatically decides whether to use static HTTP or headless Playwright, extracts the true content using Readability, and serializes it into an agent-friendly format like JSON or WSM (Web Semantic Markdown).
Key Features
Smart Fetching Strategy Rawfy analyzes the target URL. If the page is a standard static document, it uses lightning-fast HTTP fetching. If it detects a JS-heavy framework (React, Vue, Next.js, Angular), it automatically boots up a headless Chromium browser via Playwright to ensure the content is fully rendered before extraction.
Pristine Markdown Output Rawfy strips away navigation bars, footers, ads, and boilerplate, leaving only the core content formatted perfectly in Markdown.
Media Perception
- Images: Extracts
alttext and<figcaption>automatically. (Vision API support can be enabled to describe images without alt text). - Videos: Automatically detects YouTube and Vimeo embeds and fetches their text transcripts or closed-captions to pass to the agent.
- Audio: Grabs metadata and podcast transcript links.
- PDFs: If a link points directly to a PDF, Rawfy uses
pdfjs-distto extract the text seamlessly.
- Images: Extracts
Interactive Element Mapping Rawfy maps the UI of the page. It returns a structured list of buttons, forms, text inputs, and important links. This allows autonomous agents to know exactly what elements they can interact with.
Built for Agents (MCP & REST) Rawfy runs locally as an MCP (Model Context Protocol) server or a REST API, allowing immediate plug-and-play with frameworks like LangChain, Claude Code, Ollama, and Google AntiGravity.
Installation
Node.js (Primary CLI & Server)
# Install globally
npm install -g @rishiicreates/rawfy
# Install Playwright browsers (Required for rendering SPAs)
rawfy installPython Wrapper
If you're building agents in Python, you can install the wrapper. It automatically finds the Node.js CLI under the hood.
pip install .Quick Start & Usage
1. Command Line Interface (CLI)
The CLI is the easiest way to test Rawfy.
# Fetch a page and return Markdown (default)
rawfy fetch https://en.wikipedia.org/wiki/Neural_network
# Fetch and return structured JSON
rawfy fetch https://example.com --format json
# Fast mode (Skip Playwright, only use static fetch)
rawfy fetch https://example.com --no-playwright
# Save output to a file
rawfy fetch https://example.com --out result.md2. Model Context Protocol (MCP) Server
Rawfy can run as an MCP server, communicating over stdio. This allows Claude Desktop or AntiGravity to use Rawfy natively.
rawfy serveAdding to Claude Desktop Config:
{
"mcpServers": {
"rawfy": {
"command": "npx",
"args": ["-y", "@rishiicreates/rawfy", "serve"]
}
}
}3. REST API
If your agent framework only supports HTTP tool calls (like Ollama), run the REST API.
# Starts API on port 3847
rawfy api --port 3847# Example API Call
curl "http://localhost:3847/fetch?url=https://example.com&format=json"4. Node.js / TypeScript Library
import { rawfyFetch, rawfyMetadata } from 'rawfy'
// Fetch full page as JSON
const jsonOutput = await rawfyFetch('https://example.com', {
format: 'json',
noPlaywright: false
})
// Fetch lightweight metadata only (No page body)
const metadata = await rawfyMetadata('https://example.com')
console.log(metadata.title, metadata.type)5. Python Integration
from rawfy import fetch, fetch_json, metadata
# Get Markdown string
markdown_text = fetch("https://example.com")
# Get parsed Python Dictionary
data = fetch_json("https://example.com")
print(data["content"]["markdown"])
# Get fast metadata
meta = metadata("https://example.com")Output Formats
Agents consume data differently. Rawfy supports three formats using the --format flag.
1. Web Semantic Markdown (WSM) [--format markdown]
The best format for LLMs. Includes YAML metadata, pure Markdown body, and media/interactive lists appended to the bottom.
---
url: https://example.com
type: article
title: "Example Domain"
word_count: 50
---
# Example Domain
This domain is for use in illustrative examples...
## Interactive Elements
### Links (1)
- [More information...](https://www.iana.org/domains/example)2. Structured JSON [--format json]
Best for strict programmatic parsing or function-calling agents.
{
"metadata": {
"url": "https://example.com",
"title": "Example Domain",
"type": "website"
},
"content": {
"markdown": "# Example Domain\n...",
"text": "Example Domain\n..."
},
"media": [],
"interactive_elements": [
{ "type": "link", "label": "More information...", "href": "..." }
]
}3. Plain Text [--format text]
Best for agents with extremely small context windows. Strips all Markdown formatting and UPPERCASES headings for simple visual structure.
Detailed Integration Guides
Check out our specific guides for your favorite framework:
- Claude Code (MCP)
- LangChain & LangGraph (Python)
- Ollama (REST API)
- Google AntiGravity SDK
- Subprocess / Generic
Architecture Under the Hood
Rawfy operates through a linear, 7-stage pipeline:
- Routing Layer: Determines if the URL is a PDF, YouTube video, or HTML page.
- Fetch Layer: Attempts static HTTP (
node-fetch). If a JS framework signature is found, it transparently falls back to Playwright headless. - Extraction Layer: Uses Mozilla Readability to clean DOM trees.
- Markdown Conversion: Uses Turndown with custom rules to preserve tables and code blocks.
- Media Parsing: Isolates audio, video, and images. Grabs YouTube captions if available.
- Interaction Mapping: Scans the DOM for
<button>,<a>,<input>,<select>. - Serialization & Truncation: Packs everything into the requested format and strictly truncates to token limits (default 50k tokens) to prevent agent crashes.
Contributing
Contributions are welcome! If you want to add new extractors (e.g., specific Twitter or Reddit parsing) or optimize the fetch layer:
- Fork the repo.
- Run
npm installandnpm run devto watch files. - Write tests in the
tests/directory (Run withnpm test). - Submit a Pull Request!