npm.io
1.2.0 • Published 5h ago

@waqashanifkhan/crawler

Licence
MIT
Version
1.2.0
Deps
1
Size
113 kB
Vulns
0
Weekly
152

@waqashanifkhan/crawler

A TypeScript crawler and actionable SEO/GEO audit utility for public websites.

It can crawl internal HTML pages, inspect technical and on-page signals, then return a JSON-safe report with raw page data, prioritized issues, evidence, scores, and implementation steps.

Installation

npm install @waqashanifkhan/crawler

Crawl pages only

import { SiteCrawler } from "@waqashanifkhan/crawler";

const crawler = new SiteCrawler("https://example.com", {
  maxPages: 20,
  concurrency: 4,
  timeoutMs: 8_000,
});

const pages = await crawler.crawl();
const analysis = crawler.analyze();

console.log(pages);
console.log(analysis);

crawl() returns a CrawlPage[]. Each page includes its URL, status code, title, description, headings, links, canonical state, robots state, schema types, image accessibility data, delivery headers, and crawl timing.

Run a full actionable audit

import { auditWebsite } from "@waqashanifkhan/crawler";

const report = await auditWebsite("example.com", {
  maxPages: 20,
  concurrency: 4,
  timeoutMs: 8_000,
  maxAffectedUrls: 20,
});

console.log(report.scores);
console.log(report.summary.topPriorities);
console.log(report.recommendations);

The audit report is JSON-safe and contains:

  • scores: overall, technical, SEO, content, performance, accessibility, and security scores.
  • summary: crawl totals, issue counts, and the top five priorities.
  • pages: all crawled page-level data.
  • crawlAnalysis: robots.txt, sitemap, internal-link graph, duplicate metadata risk, canonical issues, orphan pages, and crawl-based performance estimates.
  • issues: grouped findings with severity, affected URLs, evidence, why the finding matters, and clear remediation steps.
  • recommendations: the same issues sorted into an implementation priority order.
  • pageAudits: each crawled URL with the related issues attached.

Recommendation format

for (const recommendation of report.recommendations) {
  console.log(`#${recommendation.priority}: ${recommendation.title}`);
  console.log(recommendation.whatToFix);
  console.log(recommendation.howToFix);
}

Notes and limitations

  • Performance numbers are crawl-based estimates. Validate them with Lighthouse and real-user Core Web Vitals before making performance claims.
  • The package reads the HTML returned to the crawler. Fully client-rendered, login-protected, or blocked pages may need browser-based testing too.
  • Respect website terms, robots.txt, rate limits, and applicable law when crawling websites.

License

MIT

Keywords