npm.io
0.2.1 • Published 3d ago

@imgly/pdf-importer

Licence
SEE LICENSE IN LICENSE.md
Version
0.2.1
Deps
3
Size
5.3 MB
Vulns
0
Weekly
206

PDF Importer for the CE.SDK

Overview

The PDF Importer for the CE.SDK allows you to seamlessly integrate PDF files into the editor while retaining essential design attributes.

Here's an overview of the main features:

  • File Format Translation: The importer converts PDF files into the CE.SDK scene file format using pdfjs-dist (Mozilla's pdf.js), in its legacy/CJS-safe build.
  • Bulk Importing: The codebase is adaptable for bulk importing, streamlining large-scale projects.
  • Color Translation: RGB, CMYK, and Separation spot colors from PDFs are translated into CE.SDK's native RGBAColor, CMYKColor, and SpotColor variants. CMYK values are preserved end-to-end instead of collapsed to sRGB; Separation inks are registered on the document's spot-color registry (via engine.editor.setSpotColorCMYK) with their declared alternate CMYK values. DeviceN inks degrade to their alternate-space solid for now.

The following PDF design elements will be preserved by the import:

  • Positioning and Rotation: Elements' positioning and rotation are accurately transferred.
  • Image Elements: Embedded images (JPEG, PNG) are extracted and placed as graphic blocks. Note that only images with formats that are supported by CE.SDK will be rendered.
  • Text Elements: Font family continuity is maintained, with options to supply font URIs or use Google fonts. Bold, italic, and weight styles are supported.
  • Vector Paths: SVG path data from the PDF is imported as vector path blocks.
  • Colors and Gradients: Solid colors, linear gradients, and radial gradients are faithfully reproduced.
  • Spot Color Detection: Cut/fold marks using spot colors (CutContour, Thru-cut, etc.) are detected and can be handled separately. Brand spot inks (Separation / DeviceN entries in the page's /ColorSpace resource dictionary) are preserved as SpotColor fills and registered on the CE.SDK document-level spot registry.

How It Works

The importer runs a three-stage pipeline on each PDF page:

  1. Extract — a pdfjs-dist operator walker emits drawable blocks (images, vector paths, text outlines) in paint order. page.getTextContent() produces one editable text run per line.
  2. Post-process — adjacent text runs with the same font/size and horizontal overlap are merged into multi-line paragraph blocks.
  3. Emit — the intermediate representation is written as CE.SDK blocks: text, image, vector, outline.

PDF points are converted to inches (1pt = 1/72 inch) for CE.SDK design units. Embedded images become buffer:// URIs; engine-provided fonts use bundle:// URIs.

Installation

You can install @imgly/pdf-importer via npm or yarn. Use the following commands to install the package:

npm install @imgly/pdf-importer
yarn add @imgly/pdf-importer

Browser Quick-Start Example

import CreativeEngine from "@cesdk/engine";
import { PDFParser, addGfontsAssetLibrary } from "@imgly/pdf-importer";

const blob = await fetch("https://example.com/document.pdf").then((res) =>
  res.blob()
);
const engine = await CreativeEngine.init({
  license: "YOUR_LICENSE",
});
// We use google fonts to replace well known fonts in the default font resolver.
await addGfontsAssetLibrary(engine);
const parser = await PDFParser.fromFile(engine, blob);

await parser.parse();

const image = await engine.block.export(
  engine.block.findByType("//ly.img.ubq/page")[0],
  "image/png"
);
const sceneExportUrl = window.URL.createObjectURL(image);
console.log("The imported PDF file looks like:", sceneExportUrl);
// You can now e.g export the scene as archive with engine.scene.saveToArchive()

Saving Scenes with Stable URLs

By default, the PDF importer creates internal buffer:// URLs for embedded images. These are transient resources that work well when saving to an archive (engine.scene.saveToArchive()), which bundles all assets together.

However, if you want to save scenes as JSON strings (engine.scene.saveToString()) with stable, permanent URLs (e.g., for storing in a database or referencing CDN-hosted assets), you need to relocate the transient resources first.

Why Relocate?
  • Scene Archives (saveToArchive): Include all assets in a single ZIP file. Transient buffer:// URLs work fine.
  • Scene Strings (saveToString): Only contain references to assets. Transient URLs won't work when reloading the scene later. You need permanent URLs (e.g., https://).
How to Relocate Transient Resources

After parsing the PDF file, use CE.SDK's native APIs to find and relocate all transient resources:

// 1. Parse the PDF file
const parser = await PDFParser.fromFile(engine, blob);
await parser.parse();

// 2. Find all transient resources (embedded images from the PDF)
const transientResources = engine.editor.findAllTransientResources();

// 3. Upload each resource and relocate to permanent URL
for (const resource of transientResources) {
  const { URL: bufferUri, size } = resource;

  // Extract binary data from the buffer
  const data = engine.editor.getBufferData(bufferUri, 0, size);

  // Upload to your backend/CDN (implement your own upload logic)
  const permanentUrl = await uploadToBackend(data);

  // Relocate the resource to the permanent URL
  engine.editor.relocateResource(bufferUri, permanentUrl);
}

// 4. Now save to string - all URLs will be permanent
const sceneString = await engine.scene.saveToString();
Note on Font URLs

When using addGfontsAssetLibrary() (the default font resolver), the resulting scene string will contain Google CDN URLs for fonts. If you need fonts hosted on your own infrastructure, configure a custom font resolver instead of using the default Google Fonts integration.

Font Strategies

The importer ships with three font-handling presets that trade editability against visual fidelity. Pick one via PDFParser.fromFile(engine, blob, { fontStrategy }), or compose your own with createFontStrategy / createFontCascade.

Preset Behavior When to use
editableFirstStrategy (default) perfect-match → PDF-embedded subset bytes → any-match substitution General-purpose import. Prefers asset-library typefaces for editability, falls back to the PDF's embedded subset for fidelity, substitutes when neither is available.
exactFidelityStrategy perfect-match → PDF-embedded subset bytes Print finalization. Never substitutes; falls through to vector outline when no matching typeface or embedded font is available.
assetLibraryStrategy perfect-match → any-match substitution Brand-locked tools. Skips the embedded-subset stage so only asset-library typefaces are used; non-matching fonts go through substitution or vector outline.
import { PDFParser, exactFidelityStrategy } from "@imgly/pdf-importer";

const parser = await PDFParser.fromFile(engine, blob, {
  fontStrategy: exactFidelityStrategy,
});
await parser.parse();

NodeJS Quick-Start Example

Prerequisite — emoji handling. Two CE.SDK settings need attention when running the importer headlessly under @cesdk/node:

  • ubq://forceSystemEmojis = false — by default the engine routes any codepoint that ICU classifies as RGI_Emoji (e.g. , , the dingbats block) through the emoji font even when the active typeface has a glyph for it. Customer PDFs frequently embed real text fonts (ZapfDingbats, Webdings, …) that map these codepoints to actual glyphs; forcing the substitution discards the producer's intended glyph and pulls in a generic color emoji. Setting the flag to false makes the engine respect the embedded/substituted font when it covers the codepoint.
  • ubq://defaultEmojiFontFileUri = <CDN URL>@cesdk/node ships only assets/core/, not assets/emoji/NotoColorEmoji.ttf. Even with forceSystemEmojis=false, true color emoji (, , …) that no embedded text font covers still need a working emoji font URI, or engine.block.export(page, "image/png") aborts with FILE_FETCH_FAILED for the engine's synthesised local-file URL. Point the engine at the IMG.LY-hosted preset, or self-host the file and supply your own URI / bundle:// path.
engine.editor.setSettingBool("ubq://forceSystemEmojis", false);
engine.editor.setSettingString(
  "ubq://defaultEmojiFontFileUri",
  "https://cdn.img.ly/assets/v4/emoji/NotoColorEmoji.ttf",
);

See the CE.SDK Emojis guide for the full set of options. Browser consumers initialised with the default IMG.LY-CDN baseURL already get the emoji font for free, and most integrations also want forceSystemEmojis=false for the same embedded-font-respect reason.

// index.mjs
import CreativeEngine from "@cesdk/node";
import { promises as fs } from "fs";
import { PDFParser, addGfontsAssetLibrary } from "@imgly/pdf-importer";

async function main() {
  const engine = await CreativeEngine.init({
    license: "YOUR_LICENSE",
  });

  // Respect embedded fonts for emoji-class codepoints (♥, ★, …) and
  // give true color emoji a working font URI — see the prerequisite
  // note above.
  engine.editor.setSettingBool("ubq://forceSystemEmojis", false);
  engine.editor.setSettingString(
    "ubq://defaultEmojiFontFileUri",
    "https://cdn.img.ly/assets/v4/emoji/NotoColorEmoji.ttf",
  );

  await addGfontsAssetLibrary(engine);

  const pdfBuffer = await fs.readFile("./document.pdf");
  const parser = await PDFParser.fromFile(engine, pdfBuffer.buffer);
  await parser.parse();

  const image = await engine.block.export(
    engine.block.findByType("//ly.img.ubq/page")[0],
    "image/png"
  );
  const imageBuffer = await image.arrayBuffer();
  await fs.writeFile("./example.png", Buffer.from(imageBuffer));

  engine.dispose();
}
main();

Issues

If you encounter any issues or have questions, please don't hesitate to contact us at support@img.ly.

Limitations and Unsupported Features

The PDF importer has some limitations and unsupported features that you should be aware of:

  1. Linked Images

    • Only embedded images are supported. External image references are not resolved.
  2. Font Support

    • Fonts not available as a typeface asset source fall back through the configured fontStrategy (see Font Strategies above): embedded subset bytes when present, then resolver substitution, then a vector-outline rendering. The default strategy substitutes; configure exactFidelityStrategy to disable substitution.
  3. Complex Vector Paths

    • Some complex clipping paths or compound shapes may experience minor distortion.
  4. Annotations and Forms

    • PDF annotations, form fields, and interactive elements are not imported.
  5. Transparency Groups

    • Advanced transparency group blending modes may not be fully reproduced.
  6. Image SMask Compositing

    • Per-pixel soft masks (image-modulated luminosity SMasks) are supported and composited into the image as an RGBA PNG. As a consequence, JPEG images carrying an SMask lose the JPEG pass-through optimization — they are decoded and re-encoded as PNG, which increases file size.

Changelog

See CHANGELOG.md for release notes.

License

The software is free for use under the AGPL License.

Keywords