npm.io
2.1.0 • Published 3 months ago

sitemap2array

Licence
MIT
Version
2.1.0
Deps
1
Size
11 kB
Vulns
0
Weekly
0

sitemap2array

Fetch a sitemap.xml URL and return its URLs as an array. Automatically resolves sitemap index files.

Install

npm install sitemap2array

Usage

const sitemap2array = require('sitemap2array');

// Regular sitemap — returns page URLs
const urls = await sitemap2array('https://example.com/sitemap.xml');
// ['https://example.com/page1', 'https://example.com/page2', ...]

// Sitemap index — automatically fetches all child sitemaps and returns all page URLs
const allUrls = await sitemap2array('https://example.com/sitemap-index.xml');
// ['https://example.com/page1', ..., 'https://example.com/page500']

Options

followIndex

When true (default), sitemap index files are resolved recursively — each child sitemap is fetched in parallel and all page URLs are flattened into a single array.

Set to false to get just the child sitemap URLs without following them:

const sitemapUrls = await sitemap2array('https://example.com/sitemap-index.xml', {
  followIndex: false,
});
// ['https://example.com/sitemap-1.xml', 'https://example.com/sitemap-2.xml']

API

sitemap2array(url, [options])

Returns a Promise<string[]>.

Parameter Type Default Description
url string Full URL to a sitemap.xml (must include http:// or https://)
options.followIndex boolean true Recursively fetch child sitemaps from sitemap index files

Supports both <urlset> (standard sitemaps) and <sitemapindex> (sitemap index files) per the sitemaps.org protocol.

Recursive depth is capped at 3 levels to prevent infinite loops.

Requirements

Node.js >= 18 (uses native fetch).

License

MIT