npm.io
1.1.0 • Published 5d ago

pii-guard-node-mini

Licence
MIT
Version
1.1.0
Deps
0
Size
295 kB
Vulns
0
Weekly
8

PII Masker (single-file TypeScript utility)

A self-contained, dependency-free (runtime) PII/secret detection + masking engine designed to be copied directly into other Node.js/TypeScript projects.

What this is for

  • Mask PII before sending text to LLMs, logs, telemetry, or analytics.
  • Detect and transform common PII/secrets using regex + lightweight validation/heuristics.
  • Provide configurable masking strategies per PII type.
  • Optionally use tokenization for same-instance reversible masking (useful for round-tripping LLM responses).

What's new in 1.1.0

All additions are backward-compatible.

  • 16 new detectors — Driver License (US), Aadhaar (Verhoeff), PAN, UK NIN, UK postcode, Canadian SIN (Luhn), VIN (ISO 3779), CVV, card expiry, NPI, DEA, and brand-specific secret detectors for Stripe, GitHub, Slack, Twilio, OpenAI, Google API keys.
  • Batch APIsbatchMask() / batchDetect().
  • Structure-aware maskingmaskJSON(), maskHTML(), maskMarkdown() (skips tags / <script> / fenced code / inline code).
  • StreamingmaskStream() returns a Node stream.Transform for very large inputs.
  • Express/Connect middlewarecreateMiddleware({ fields, responseBody, attachReport, skip }).
  • Risk scoringriskScore(text){ score, level, breakdown, dominantType }.
  • Audit & explainexplain(), diff(), sanitizeForLog(), optional in-memory audit log.
  • Vault TTLvaultTTL + pruneVault() for time-limited tokens.
  • Multi-locale — run multiple locale-aware detectors in one instance via locales: [...].
  • Pattern overrides — replace built-in regex via patternOverrides.
  • New presetsgdpr, india, developer. Expanded hipaa (adds NPI/DEA) and pci-dss (adds CVV/expiry).

Install / Add to your project

This asset is intentionally shipped as a single file.

Option A — copy the file

  1. Copy pii-masker.ts into your project (e.g. src/utils/pii-masker.ts).

  2. If your project uses TypeScript, make sure Node types are available:

npm i -D @types/node

This file uses Node APIs (crypto, Buffer). It’s meant for Node.js runtimes (or bundlers configured to polyfill Node APIs).

Option B — install from npm

npm i pii-guard-node-mini

If you copied the file instead of installing from npm, replace import paths like "pii-guard-node-mini" with your local path (e.g. "./utils/pii-masker").


Quick start

import createPIIMasker, { PIIType, MaskingStrategy } from "pii-guard-node-mini";

const masker = createPIIMasker({
  preset: "balanced",
  logLevel: "silent",
  strategies: {
    [PIIType.CREDIT_CARD]: MaskingStrategy.REDACT,
  },
});

const input = "Email me at jane.doe@company.com. Card: 4111 1111 1111 1111";
const { maskedText, entities } = masker.mask(input);

console.log(maskedText);
console.log(entities);

API overview

createPIIMasker(userConfig?) returns an object with:

Core

  • mask(text){ maskedText, entities, maskMap } (may also include warnings / truncated)
  • maskObject(obj, fieldPaths?){ masked, entities, maskMap } (may include warnings)
  • detect(text)DetectedEntity[]
  • unmask(text)string (works with tokens produced by this instance, or any vault imported via importVault)
  • addDetector(type, fn) → register custom detector
  • addStrategy(name, fn) → register custom strategy
  • updateConfig(partial) → hot-update instance config
  • getReport() → basic usage metrics
  • clearVault() → clears token vault (affects unmask)
  • exportVault()Record<string, string> of token → original for external persistence
  • importVault(entries) → loads token → original mappings produced elsewhere
  • resetReport() → clears counters
  • getConfig() → resolved config snapshot

Batch & structure-aware (v2)

  • batchMask(texts) / batchDetect(texts) → process arrays in one call
  • maskJSON(jsonText) → parse JSON, mask string leaves, re-serialize
  • maskHTML(html) → mask text nodes only (tags / <script> / <style> preserved)
  • maskMarkdown(md) → mask prose while leaving fenced code blocks and inline `code` intact

Streaming & middleware (v2)

  • maskStream(opts?) → returns a Node stream.Transform for large files / pipes
  • createMiddleware(opts?) → Express/Connect-style (req, res, next) adapter

Risk & audit (v2)

  • riskScore(text){ score, level, breakdown, dominantType, entityCount }
  • explain(text) → human-readable multi-line audit string
  • diff(text) → span-level [{ start, end, original, masked, type }]
  • sanitizeForLog(value) → masks string or deep-masks object (passes primitives through)
  • pruneVault() → evict expired vault entries (with vaultTTL)
  • getAuditLog() / clearAuditLog() → in-memory audit log (requires auditLog: true)

Presets

Presets pre-configure detectors + defaults.

const strict = createPIIMasker({ preset: "strict" });
const balanced = createPIIMasker({ preset: "balanced" });
const minimal = createPIIMasker({ preset: "minimal" });
const hipaa = createPIIMasker({ preset: "hipaa" }); // + NPI, DEA
const pci = createPIIMasker({ preset: "pci-dss" }); // + CVV, expiry
const gdpr = createPIIMasker({ preset: "gdpr" }); // EU-focused
const india = createPIIMasker({ preset: "india" }); // Aadhaar, PAN
const dev = createPIIMasker({ preset: "developer" }); // secrets only

Notes:

  • Preset values can still be overridden by passing your own config fields.
  • "HIPAA" / "PCI-DSS" / "GDPR" preset names are practical bundles; you should still validate your usage for your environment.

Configuration (EngineConfig)

You pass a Partial<EngineConfig> to createPIIMasker().

Common knobs:

By default, results include entities (with original values) and maskMap (original → masked). In production pipelines this can accidentally re-introduce sensitive data into logs.

Recommended configuration:

import createPIIMasker, { MaskingStrategy } from "pii-guard-node-mini";

const masker = createPIIMasker({
  preset: "balanced",

  // Output controls (recommended for production)
  includeMaskMap: false,
  includeEntities: true,
  entityValueMode: "none", // don't return raw matches
  includeMaskedValueInEntities: true, // safe to include the replacement

  // Safety limits
  maxTextLength: 200_000,
  maxEntities: 2_000,
  onLimitExceeded: "truncate", // or "throw" if you prefer hard-fail

  // Recommended for LLM flows
  defaultStrategy: MaskingStrategy.REDACT,
});

Notes:

  • If you set entityValueMode: "none", DetectedEntity.value becomes an empty string.
  • If you need original values for debugging, enable them only in dev/test.
Output controls (data minimization)

These options are designed to prevent accidental re-introduction of sensitive data via outputs.

  • includeMaskMap: if true, maskMap contains original → masked mappings.
  • includeEntities: if false, entities will be an empty list.
  • entityValueMode:
    • "original" (default): DetectedEntity.value is the raw match.
    • "masked": DetectedEntity.value becomes the replacement value.
    • "none": DetectedEntity.value becomes an empty string.
  • includeMaskedValueInEntities: includes DetectedEntity.masked (the replacement) in the entity.

Example (mask safely, keep entity locations/types only):

const masker = createPIIMasker({
  includeMaskMap: false,
  includeEntities: true,
  entityValueMode: "none",
  includeMaskedValueInEntities: false,
});
confidenceThreshold

Higher means fewer matches (less false positives), lower means more aggressive masking.

const masker = createPIIMasker({ confidenceThreshold: 0.7 });
defaultStrategy and strategies

Set the global default masking behavior, and override per type.

import { MaskingStrategy, PIIType } from "pii-guard-node-mini";

const masker = createPIIMasker({
  defaultStrategy: MaskingStrategy.PARTIAL_MASK,
  strategies: {
    [PIIType.SSN]: MaskingStrategy.REDACT,
    [PIIType.CREDIT_CARD]: MaskingStrategy.REDACT,
    [PIIType.API_KEY]: MaskingStrategy.REDACT,
  },
});
allowList

Values that should not be masked even if they look like PII.

const masker = createPIIMasker({
  allowList: ["example.com", "localhost"],
});
denyList

Terms that must always be masked.

const masker = createPIIMasker({
  denyList: [
    { term: "ProjectX", type: "CUSTOM" },
    { term: "InternalCodeWord", type: "CUSTOM" },
  ],
});
detectorsEnabled

Run only a subset of detectors.

import { PIIType } from "pii-guard-node-mini";

const masker = createPIIMasker({
  detectorsEnabled: new Set([PIIType.EMAIL, PIIType.PHONE]),
});
locale

Affects some patterns/heuristics.

const maskerUS = createPIIMasker({ locale: "US" });
const maskerUK = createPIIMasker({ locale: "UK" });
const maskerIN = createPIIMasker({ locale: "IN" });
hashSalt

Only used by HASH strategy.

const masker = createPIIMasker({
  defaultStrategy: "HASH",
  hashSalt: "your-app-specific-salt",
});
Limits: maxTextLength, maxEntities, onLimitExceeded

These controls are designed for untrusted inputs (logs, user text, LLM output) to prevent worst-case performance.

const masker = createPIIMasker({
  maxTextLength: 100_000,
  maxEntities: 1_000,
  onLimitExceeded: "truncate", // or "throw"
});

const res = masker.mask(veryLargeText);
if (res.warnings?.length) {
  // handle warnings in your telemetry
}
Output controls: includeMaskMap, includeEntities, entityValueMode
const masker = createPIIMasker({
  includeMaskMap: false,
  includeEntities: true,
  entityValueMode: "masked", // or "none" in production
  includeMaskedValueInEntities: true,
});
logLevel
const masker = createPIIMasker({ logLevel: "warn" });

Masking strategies

Available MaskingStrategy values:

  • REDACT[REDACTED_<TYPE>]
  • PARTIAL_MASK → keep some structure (e.g., last 4 digits)
  • HASH → stable salted hash token like [HASH:abcd1234...]
  • TOKENIZE<<PII_deadbeef>> + store mapping in memory vault
  • REPLACE_FAKE → replace with realistic fake values (best for demos)
  • CUSTOM_FN → call a registered custom function

Custom detectors

Add your own detector for domain-specific identifiers.

import { createPIIMasker, type DetectedEntity } from "pii-guard-node-mini";

const masker = createPIIMasker({});

masker.addDetector("EMPLOYEE_ID", (text) => {
  const m = /E-\d{4,}/g.exec(text);
  if (!m) return [];

  const entity: DetectedEntity = {
    type: "EMPLOYEE_ID",
    value: m[0],
    start: m.index,
    end: m.index + m[0].length,
    confidence: 1,
  };

  return [entity];
});

masker.updateConfig({
  strategies: { EMPLOYEE_ID: "REDACT" },
});

Custom strategies

Register a named strategy and assign it per type.

import { createPIIMasker } from "pii-guard-node-mini";

const masker = createPIIMasker({});

masker.addStrategy("KEEP_LAST_2", (entity) =>
  entity.value.replace(/.(?=.{2})/g, "*"),
);

masker.updateConfig({
  strategies: {
    API_KEY: "KEEP_LAST_2",
  },
});

Masking objects (maskObject) and fieldPaths

By default, maskObject() deep-traverses and masks all string values.

If you pass fieldPaths, only matching paths are masked.

  • Exact: user.email
  • Wildcard segment: users.*.email
const input = {
  users: [
    { email: "a@b.com", note: "keep this" },
    { email: "c@d.com", note: "keep this" },
  ],
};

const out = masker.maskObject(input, ["users.*.email"]);

Reversible masking (tokenization + unmask)

If you want to restore original values (e.g., when an LLM responds with tokens), use TOKENIZE and set enableReversibility: true.

import { createPIIMasker, MaskingStrategy } from "pii-guard-node-mini";

const masker = createPIIMasker({
  enableReversibility: true,
  defaultStrategy: MaskingStrategy.TOKENIZE,
});

const masked = masker.mask("Call me at +1 555 123 4567").maskedText;
const restored = masker.unmask(masked);

Important:

  • Reversal only works for values tokenized by the same masker instance.
  • If you call clearVault(), those mappings are lost.

Tokenization behavior note:

  • If enableReversibility: false, the engine will still produce token-looking placeholders, but it will not store mappings (so unmask() cannot restore, and tokens may not be deterministic).
  • If you need deterministic tokens, enable reversibility and keep the instance alive for the conversation/session.
Determinism: tokenizationDeterministic

tokenizationDeterministic: true means the same original value becomes the same token within the same instance.

Notes:

  • Determinism requires storing mappings, so it only applies when enableReversibility: true.
  • If you disable reversibility, tokens are generated but not stored.

Cross-instance / external vault persistence

The token vault is in-memory by default and lost when the process exits. Use exportVault() and importVault() to persist the token → original mappings yourself (file, Redis, DB, KV, etc.) and reverse tokens later from any other instance or process.

import createPIIMasker, { MaskingStrategy } from "pii-guard-node-mini";

// Producer: tokenize with reversibility ON so the vault stores mappings.
const producer = createPIIMasker({
  enableReversibility: true,
  defaultStrategy: MaskingStrategy.TOKENIZE,
});

const { maskedText } = producer.mask(
  "Call me at +1 555 123 4567 or email jane.doe@example.com",
);

// Persist the vault (here: just JSON-serialize it).
const vaultJson = JSON.stringify(producer.exportVault());
// store `vaultJson` in your DB / KV / file, keyed by request id

// ---- later, in a different instance/process ----

const consumer = createPIIMasker({
  // `enableReversibility` does NOT need to be true on the consumer —
  // importing a vault is enough to enable unmask() for those tokens.
  enableReversibility: false,
});

consumer.importVault(JSON.parse(vaultJson));
const restored = consumer.unmask(maskedText);
// restored === "Call me at +1 555 123 4567 or email jane.doe@example.com"

Notes:

  • exportVault() returns {} when the vault is empty.
  • importVault() merges entries into the current vault; call clearVault() first if you want a clean slate.
  • unmask() is a no-op when reversibility is off and no vault has been imported (it logs a warning and returns the input unchanged).
  • The exported object contains raw PII values — encrypt it at rest, scope access, and delete it as soon as it is no longer needed.

Built-in detectors (high level)

This library uses pattern-based detectors. Built-in coverage includes:

Core

  • Email, phone, SSN, credit card (Luhn), IP, DOB, US address
  • Person names (heuristic), passport (US/UK), IBAN
  • AWS keys/secrets (heuristic), generic API keys (entropy), JWT, URLs with auth, MAC
  • Bank account / routing (conservative: requires context and routing checksum where possible)

Jurisdictional / domain (v2)

  • US driver license (state-pattern + context)
  • Aadhaar (Verhoeff checksum) and PAN (India)
  • UK National Insurance Number + UK postcode
  • Canadian SIN (Luhn-checked)
  • VIN (ISO 3779 checksum)
  • CVV / card expiry (context-gated)
  • NPI (NPPES Luhn) and DEA number (DEA checksum) — for healthcare

Brand-specific secret detectors (v2)

  • Stripe (sk_live_… / sk_test_…)
  • GitHub PATs (ghp_…, gho_…, ghu_…, ghs_…, ghr_…, and fine-grained github_pat_…)
  • Slack (xoxb-…, xoxa-…, etc.)
  • Twilio Account SID (AC…)
  • OpenAI keys (sk-…, sk-proj-…)
  • Google API keys (AIza…)

Reporting

const masker = createPIIMasker({ preset: "balanced" });
masker.mask("Email: a@b.com");
masker.mask("Card: 4111 1111 1111 1111");

console.log(masker.getReport());

Batch operations

Process arrays of strings in a single call.

const m = createPIIMasker({ preset: "balanced" });

const results = m.batchMask([
  "Email me at jane@company.com",
  "Card: 4111 1111 1111 1111",
  "no pii here",
]);
// results: MaskingResult[] - same order as input

const detections = m.batchDetect([
  "Email me at jane@company.com",
  "no pii here",
]);
// detections: DetectedEntity[][] - same order as input

Structure-aware masking

Three helpers that understand the syntax of common inputs.

maskJSON(jsonText)

Parses a JSON string, masks every string leaf, and re-serializes. Falls back to plain mask() if the input isn't valid JSON.

const out = m.maskJSON(
  JSON.stringify({ user: { email: "jane@company.com" }, count: 5 }),
);
// out.maskedText === '{"user":{"email":"[REDACTED_EMAIL]"},"count":5}'
maskHTML(html)

Regex-tokenizes a tiny HTML-ish string and only masks text nodes. Tags, attributes, and the contents of <script> / <style> are preserved verbatim. No DOM parser is used — the file stays dependency-free.

m.maskHTML(`<p>Email: <b>jane@company.com</b></p>`);
// → "<p>Email: <b>[REDACTED_EMAIL]</b></p>"
maskMarkdown(md)

Protects fenced code blocks (```) and inline `code` spans, then masks the surrounding prose.

const md =
  "Reach me: jane@company.com\n\n```\nconst e = 'jane@company.com';\n```";
m.maskMarkdown(md).maskedText;
// prose email is masked; the code block keeps the literal email

Streaming large inputs

maskStream() returns a Node.js stream.Transform so you can mask multi-GB logs / files without loading them into memory.

import { createReadStream, createWriteStream } from "fs";
import createPIIMasker from "pii-guard-node-mini";

const m = createPIIMasker({ preset: "balanced" });

createReadStream("input.log")
  .pipe(m.maskStream({ emitEntities: true }))
  .on("entity", (entities) => {
    // optional per-chunk metrics
  })
  .pipe(createWriteStream("masked.log"));

Options:

  • bufferBoundary (default: /[\n\r.!?]\s/) — regex used to find a safe split point so detections don't get cut across chunks.
  • maxBufferSize (default: 64 KiB) — hard cap before the stream forces a flush.
  • emitEntities — emit 'entity' events for observability.

Throws on non-Node environments where the stream module is unavailable.


Express / Connect middleware

createMiddleware() returns a standard (req, res, next) handler.

import express from "express";
import createPIIMasker from "pii-guard-node-mini";

const m = createPIIMasker({ preset: "balanced" });

const app = express();
app.use(express.json());

app.use(
  m.createMiddleware({
    fields: ["body", "query", "params"], // default
    responseBody: true, // mask outgoing res.json bodies
    attachReport: true, // res.locals.piiReport
    skip: (req) => req.path === "/healthz",
  }),
);

app.post("/log", (req, res) => {
  // req.body is already masked here
  // res.locals.piiReport contains { entityCount, byType }
  res.json({ ok: true });
});

Risk scoring

riskScore(text) returns a 0–100 score and a qualitative bucket. Useful as a pre-LLM gate, a router signal, or for compliance dashboards.

const r = m.riskScore("SSN: 123-45-6789 and card 4111 1111 1111 1111");
// r.score      → e.g. 35
// r.level      → "high"      ("low" | "medium" | "high" | "critical")
// r.breakdown  → { SSN: 22.5, CREDIT_CARD: 25.0 }
// r.dominantType → "CREDIT_CARD"
// r.entityCount  → 2

Tune weights with riskWeights:

const m = createPIIMasker({
  riskWeights: { [PIIType.EMAIL]: 12 },
});

Explain & diff

console.log(m.explain("Email jane@company.com or call +1 555 123 4567"));
// Detected 2 entities:
//   - [EMAIL] at pos 6-22 (confidence: 1.00) → j**e@c*****y.com
//   - [PHONE] at pos 31-46 (confidence: 0.95) → ************4567

m.diff("Email jane@company.com");
// [{ start: 6, end: 22, original: "jane@company.com", masked: "...", type: "EMAIL" }]

sanitizeForLog(value)

Convenience wrapper — masks a string, deep-masks an object, passes primitives through.

logger.info({ payload: m.sanitizeForLog(req.body) });

Vault TTL

When using TOKENIZE with enableReversibility: true, you can auto-expire entries via vaultTTL (ms). Expired tokens are lazily evicted on the next unmask(), tokenize(), or explicit pruneVault() call.

const m = createPIIMasker({
  enableReversibility: true,
  defaultStrategy: MaskingStrategy.TOKENIZE,
  vaultTTL: 5 * 60 * 1000, // 5 minutes
});

m.mask("Call me at +1 555 123 4567");
// ... 6 minutes later ...
m.pruneVault(); // → number evicted
m.unmask(token); // returns token unchanged because it expired

Audit log

Set auditLog: true to keep an in-memory record of every mask() / maskObject() / detect() call. Disabled by default to avoid memory growth on long-running processes.

const m = createPIIMasker({ auditLog: true });
m.mask("Email: jane@company.com");

m.getAuditLog();
// [
//   { timestamp, operation: 'mask', inputLength, entityCount, byType, warnings? },
//   ...
// ]

m.clearAuditLog();

Multi-locale

Run locale-aware detectors for several jurisdictions in the same instance:

const m = createPIIMasker({
  locales: ["US", "IN", "UK"],
  detectorsEnabled: new Set([
    PIIType.SSN,
    PIIType.AADHAAR,
    PIIType.UK_NIN,
    PIIType.UK_POSTCODE,
  ]),
});

The legacy single-locale locale: 'US' field still works (and is used when locales is unset).


Pattern overrides

Replace a built-in regex without writing a full detector:

const m = createPIIMasker({
  patternOverrides: {
    EMAIL: /[\w.+-]+@(?:secret|internal)\.co\b/g,
  },
  detectorsEnabled: new Set([PIIType.EMAIL]),
});

Note: overrides mutate the shared PATTERNS table at engine construction — last writer wins across instances. For per-instance isolation, register a custom detector instead.


License

MIT License — see LICENSE.

Keywords