A modular, open source library for converting HTML content into professional document formats. Initially focused on HTML-to-DOCX conversion, with planned support for PDF and XLSX. Built with TypeScript, it features a core HTML parsing engine and separate
Core HTML validation logic for html-eslint
Semantic release monorepo configuration used by the various HTML-validate packages
Escape special characters to HTML entities in JavaScript
PDF adapter for html-to-document-core — converts a DocumentElement tree into .pdf using the html2pdf.js library.
Core engine that parses HTML into an intermediate DocumentElement tree and exposes a plugin registry so external adapters can convert that tree into DOCX, PDF, XLSX, Markdown and more.
DOCX adapter for html-to-document-core — converts a DocumentElement tree into a .docx Buffer using the docx library.
PDF deconverter for html-to-document-core — converts PDF files to DocumentElement[] using pdf-parse.
CSS parser plugin for html-to-document-core that harvests tags and appends parsed statements to the per-parse stylesheet.