1.1.0 • Published 5d ago

pii-guard-node-mini

Licence

MIT

Version

1.1.0

Deps

Size

295 kB

Vulns

Weekly

Summary Dependency Versions

PII Masker (single-file TypeScript utility)

A self-contained, dependency-free (runtime) PII/secret detection + masking engine designed to be copied directly into other Node.js/TypeScript projects.

What this is for

Mask PII before sending text to LLMs, logs, telemetry, or analytics.
Detect and transform common PII/secrets using regex + lightweight validation/heuristics.
Provide configurable masking strategies per PII type.
Optionally use tokenization for same-instance reversible masking (useful for round-tripping LLM responses).

What's new in 1.1.0

All additions are backward-compatible.

16 new detectors — Driver License (US), Aadhaar (Verhoeff), PAN, UK NIN, UK postcode, Canadian SIN (Luhn), VIN (ISO 3779), CVV, card expiry, NPI, DEA, and brand-specific secret detectors for Stripe, GitHub, Slack, Twilio, OpenAI, Google API keys.
Batch APIs — batchMask() / batchDetect().
Structure-aware masking — maskJSON(), maskHTML(), maskMarkdown() (skips tags / <script> / fenced code / inline code).
Streaming — maskStream() returns a Node stream.Transform for very large inputs.
Express/Connect middleware — createMiddleware({ fields, responseBody, attachReport, skip }).
Risk scoring — riskScore(text) → { score, level, breakdown, dominantType }.
Audit & explain — explain(), diff(), sanitizeForLog(), optional in-memory audit log.
Vault TTL — vaultTTL + pruneVault() for time-limited tokens.
Multi-locale — run multiple locale-aware detectors in one instance via locales: [...].
Pattern overrides — replace built-in regex via patternOverrides.
New presets — gdpr, india, developer. Expanded hipaa (adds NPI/DEA) and pci-dss (adds CVV/expiry).

Install / Add to your project

This asset is intentionally shipped as a single file.

Option A — copy the file

Copy pii-masker.ts into your project (e.g. src/utils/pii-masker.ts).
If your project uses TypeScript, make sure Node types are available:

npm i -D @types/node

This file uses Node APIs (crypto, Buffer). It’s meant for Node.js runtimes (or bundlers configured to polyfill Node APIs).

Option B — install from npm

npm i pii-guard-node-mini

If you copied the file instead of installing from npm, replace import paths like "pii-guard-node-mini" with your local path (e.g. "./utils/pii-masker").

Quick start

import createPIIMasker, { PIIType, MaskingStrategy } from "pii-guard-node-mini";

const masker = createPIIMasker({
  preset: "balanced",
  logLevel: "silent",
  strategies: {
    [PIIType.CREDIT_CARD]: MaskingStrategy.REDACT,
  },
});

const input = "Email me at jane.doe@company.com. Card: 4111 1111 1111 1111";
const { maskedText, entities } = masker.mask(input);

console.log(maskedText);
console.log(entities);

API overview

createPIIMasker(userConfig?) returns an object with:

Core

mask(text) → { maskedText, entities, maskMap } (may also include warnings / truncated)
maskObject(obj, fieldPaths?) → { masked, entities, maskMap } (may include warnings)
detect(text) → DetectedEntity[]
unmask(text) → string (works with tokens produced by this instance, or any vault imported via importVault)
addDetector(type, fn) → register custom detector
addStrategy(name, fn) → register custom strategy
updateConfig(partial) → hot-update instance config
getReport() → basic usage metrics
clearVault() → clears token vault (affects unmask)
exportVault() → Record<string, string> of token → original for external persistence
importVault(entries) → loads token → original mappings produced elsewhere
resetReport() → clears counters
getConfig() → resolved config snapshot

Batch & structure-aware (v2)

batchMask(texts) / batchDetect(texts) → process arrays in one call
maskJSON(jsonText) → parse JSON, mask string leaves, re-serialize
maskHTML(html) → mask text nodes only (tags / <script> / <style> preserved)
maskMarkdown(md) → mask prose while leaving fenced code blocks and inline `code` intact

Streaming & middleware (v2)

maskStream(opts?) → returns a Node stream.Transform for large files / pipes
createMiddleware(opts?) → Express/Connect-style (req, res, next) adapter

Risk & audit (v2)

riskScore(text) → { score, level, breakdown, dominantType, entityCount }
explain(text) → human-readable multi-line audit string
diff(text) → span-level [{ start, end, original, masked, type }]
sanitizeForLog(value) → masks string or deep-masks object (passes primitives through)
pruneVault() → evict expired vault entries (with vaultTTL)
getAuditLog() / clearAuditLog() → in-memory audit log (requires auditLog: true)

Presets

Presets pre-configure detectors + defaults.

const strict = createPIIMasker({ preset: "strict" });
const balanced = createPIIMasker({ preset: "balanced" });
const minimal = createPIIMasker({ preset: "minimal" });
const hipaa = createPIIMasker({ preset: "hipaa" }); // + NPI, DEA
const pci = createPIIMasker({ preset: "pci-dss" }); // + CVV, expiry
const gdpr = createPIIMasker({ preset: "gdpr" }); // EU-focused
const india = createPIIMasker({ preset: "india" }); // Aadhaar, PAN
const dev = createPIIMasker({ preset: "developer" }); // secrets only

Notes:

Preset values can still be overridden by passing your own config fields.
"HIPAA" / "PCI-DSS" / "GDPR" preset names are practical bundles; you should still validate your usage for your environment.

Configuration (EngineConfig)

You pass a Partial<EngineConfig> to createPIIMasker().

Common knobs:

Production/enterprise recommended defaults

By default, results include entities (with original values) and maskMap (original → masked). In production pipelines this can accidentally re-introduce sensitive data into logs.

Recommended configuration:

import createPIIMasker, { MaskingStrategy } from "pii-guard-node-mini";

const masker = createPIIMasker({
  preset: "balanced",

  // Output controls (recommended for production)
  includeMaskMap: false,
  includeEntities: true,
  entityValueMode: "none", // don't return raw matches
  includeMaskedValueInEntities: true, // safe to include the replacement

  // Safety limits
  maxTextLength: 200_000,
  maxEntities: 2_000,
  onLimitExceeded: "truncate", // or "throw" if you prefer hard-fail

  // Recommended for LLM flows
  defaultStrategy: MaskingStrategy.REDACT,
});

Notes:

If you set entityValueMode: "none", DetectedEntity.value becomes an empty string.
If you need original values for debugging, enable them only in dev/test.

Output controls (data minimization)

These options are designed to prevent accidental re-introduction of sensitive data via outputs.

includeMaskMap: if true, maskMap contains original → masked mappings.
includeEntities: if false, entities will be an empty list.
entityValueMode:
- "original" (default): DetectedEntity.value is the raw match.
- "masked": DetectedEntity.value becomes the replacement value.
- "none": DetectedEntity.value becomes an empty string.
includeMaskedValueInEntities: includes DetectedEntity.masked (the replacement) in the entity.

Example (mask safely, keep entity locations/types only):

const masker = createPIIMasker({
  includeMaskMap: false,
  includeEntities: true,
  entityValueMode: "none",
  includeMaskedValueInEntities: false,
});

`confidenceThreshold`

Higher means fewer matches (less false positives), lower means more aggressive masking.

const masker = createPIIMasker({ confidenceThreshold: 0.7 });

`defaultStrategy` and `strategies`

Set the global default masking behavior, and override per type.

import { MaskingStrategy, PIIType } from "pii-guard-node-mini";

const masker = createPIIMasker({
  defaultStrategy: MaskingStrategy.PARTIAL_MASK,
  strategies: {
    [PIIType.SSN]: MaskingStrategy.REDACT,
    [PIIType.CREDIT_CARD]: MaskingStrategy.REDACT,
    [PIIType.API_KEY]: MaskingStrategy.REDACT,
  },
});

`allowList`

Values that should not be masked even if they look like PII.

const masker = createPIIMasker({
  allowList: ["example.com", "localhost"],
});

`denyList`

Terms that must always be masked.

const masker = createPIIMasker({
  denyList: [
    { term: "ProjectX", type: "CUSTOM" },
    { term: "InternalCodeWord", type: "CUSTOM" },
  ],
});

`detectorsEnabled`

Run only a subset of detectors.

import { PIIType } from "pii-guard-node-mini";

const masker = createPIIMasker({
  detectorsEnabled: new Set([PIIType.EMAIL, PIIType.PHONE]),
});

`locale`

Affects some patterns/heuristics.

const maskerUS = createPIIMasker({ locale: "US" });
const maskerUK = createPIIMasker({ locale: "UK" });
const maskerIN = createPIIMasker({ locale: "IN" });

`hashSalt`

Only used by HASH strategy.

const masker = createPIIMasker({
  defaultStrategy: "HASH",
  hashSalt: "your-app-specific-salt",
});

Limits: `maxTextLength`, `maxEntities`, `onLimitExceeded`

These controls are designed for untrusted inputs (logs, user text, LLM output) to prevent worst-case performance.

const masker = createPIIMasker({
  maxTextLength: 100_000,
  maxEntities: 1_000,
  onLimitExceeded: "truncate", // or "throw"
});

const res = masker.mask(veryLargeText);
if (res.warnings?.length) {
  // handle warnings in your telemetry
}

Output controls: `includeMaskMap`, `includeEntities`, `entityValueMode`

const masker = createPIIMasker({
  includeMaskMap: false,
  includeEntities: true,
  entityValueMode: "masked", // or "none" in production
  includeMaskedValueInEntities: true,
});

`logLevel`

const masker = createPIIMasker({ logLevel: "warn" });

Masking strategies

Available MaskingStrategy values:

REDACT → [REDACTED_<TYPE>]
PARTIAL_MASK → keep some structure (e.g., last 4 digits)
HASH → stable salted hash token like [HASH:abcd1234...]
TOKENIZE → <<PII_deadbeef>> + store mapping in memory vault
REPLACE_FAKE → replace with realistic fake values (best for demos)
CUSTOM_FN → call a registered custom function

Custom detectors

Add your own detector for domain-specific identifiers.

import { createPIIMasker, type DetectedEntity } from "pii-guard-node-mini";

const masker = createPIIMasker({});

masker.addDetector("EMPLOYEE_ID", (text) => {
  const m = /E-\d{4,}/g.exec(text);
  if (!m) return [];

  const entity: DetectedEntity = {
    type: "EMPLOYEE_ID",
    value: m[0],
    start: m.index,
    end: m.index + m[0].length,
    confidence: 1,
  };

  return [entity];
});

masker.updateConfig({
  strategies: { EMPLOYEE_ID: "REDACT" },
});

Custom strategies

import { createPIIMasker } from "pii-guard-node-mini";

const masker = createPIIMasker({});

masker.addStrategy("KEEP_LAST_2", (entity) =>
  entity.value.replace(/.(?=.{2})/g, "*"),
);

masker.updateConfig({
  strategies: {
    API_KEY: "KEEP_LAST_2",
  },
});

Masking objects (`maskObject`) and `fieldPaths`

By default, maskObject() deep-traverses and masks all string values.

If you pass fieldPaths, only matching paths are masked.

Exact: user.email
Wildcard segment: users.*.email

const input = {
  users: [
    { email: "a@b.com", note: "keep this" },
    { email: "c@d.com", note: "keep this" },
  ],
};

const out = masker.maskObject(input, ["users.*.email"]);

Reversible masking (tokenization + unmask)

If you want to restore original values (e.g., when an LLM responds with tokens), use TOKENIZE and set enableReversibility: true.

import { createPIIMasker, MaskingStrategy } from "pii-guard-node-mini";

const masker = createPIIMasker({
  enableReversibility: true,
  defaultStrategy: MaskingStrategy.TOKENIZE,
});

const masked = masker.mask("Call me at +1 555 123 4567").maskedText;
const restored = masker.unmask(masked);

Important:

Reversal only works for values tokenized by the same masker instance.
If you call clearVault(), those mappings are lost.

Tokenization behavior note:

If enableReversibility: false, the engine will still produce token-looking placeholders, but it will not store mappings (so unmask() cannot restore, and tokens may not be deterministic).
If you need deterministic tokens, enable reversibility and keep the instance alive for the conversation/session.

Determinism: `tokenizationDeterministic`

tokenizationDeterministic: true means the same original value becomes the same token within the same instance.

Notes:

Determinism requires storing mappings, so it only applies when enableReversibility: true.
If you disable reversibility, tokens are generated but not stored.

Cross-instance / external vault persistence

The token vault is in-memory by default and lost when the process exits. Use exportVault() and importVault() to persist the token → original mappings yourself (file, Redis, DB, KV, etc.) and reverse tokens later from any other instance or process.

import createPIIMasker, { MaskingStrategy } from "pii-guard-node-mini";

// Producer: tokenize with reversibility ON so the vault stores mappings.
const producer = createPIIMasker({
  enableReversibility: true,
  defaultStrategy: MaskingStrategy.TOKENIZE,
});

const { maskedText } = producer.mask(
  "Call me at +1 555 123 4567 or email jane.doe@example.com",
);

// Persist the vault (here: just JSON-serialize it).
const vaultJson = JSON.stringify(producer.exportVault());
// store `vaultJson` in your DB / KV / file, keyed by request id

// ---- later, in a different instance/process ----

const consumer = createPIIMasker({
  // `enableReversibility` does NOT need to be true on the consumer —
  // importing a vault is enough to enable unmask() for those tokens.
  enableReversibility: false,
});

consumer.importVault(JSON.parse(vaultJson));
const restored = consumer.unmask(maskedText);
// restored === "Call me at +1 555 123 4567 or email jane.doe@example.com"

Notes:

exportVault() returns {} when the vault is empty.
importVault() merges entries into the current vault; call clearVault() first if you want a clean slate.
unmask() is a no-op when reversibility is off and no vault has been imported (it logs a warning and returns the input unchanged).
The exported object contains raw PII values — encrypt it at rest, scope access, and delete it as soon as it is no longer needed.

Built-in detectors (high level)

This library uses pattern-based detectors. Built-in coverage includes:

Core

Email, phone, SSN, credit card (Luhn), IP, DOB, US address
Person names (heuristic), passport (US/UK), IBAN
AWS keys/secrets (heuristic), generic API keys (entropy), JWT, URLs with auth, MAC
Bank account / routing (conservative: requires context and routing checksum where possible)

Jurisdictional / domain (v2)

US driver license (state-pattern + context)
Aadhaar (Verhoeff checksum) and PAN (India)
UK National Insurance Number + UK postcode
Canadian SIN (Luhn-checked)
VIN (ISO 3779 checksum)
CVV / card expiry (context-gated)
NPI (NPPES Luhn) and DEA number (DEA checksum) — for healthcare

Brand-specific secret detectors (v2)

Stripe (sk_live_… / sk_test_…)
GitHub PATs (ghp_…, gho_…, ghu_…, ghs_…, ghr_…, and fine-grained github_pat_…)
Slack (xoxb-…, xoxa-…, etc.)
Twilio Account SID (AC…)
OpenAI keys (sk-…, sk-proj-…)
Google API keys (AIza…)

Reporting

const masker = createPIIMasker({ preset: "balanced" });
masker.mask("Email: a@b.com");
masker.mask("Card: 4111 1111 1111 1111");

console.log(masker.getReport());

Batch operations

Process arrays of strings in a single call.

const m = createPIIMasker({ preset: "balanced" });

const results = m.batchMask([
  "Email me at jane@company.com",
  "Card: 4111 1111 1111 1111",
  "no pii here",
]);
// results: MaskingResult[] - same order as input

const detections = m.batchDetect([
  "Email me at jane@company.com",
  "no pii here",
]);
// detections: DetectedEntity[][] - same order as input

Structure-aware masking

Three helpers that understand the syntax of common inputs.

`maskJSON(jsonText)`

Parses a JSON string, masks every string leaf, and re-serializes. Falls back to plain mask() if the input isn't valid JSON.

const out = m.maskJSON(
  JSON.stringify({ user: { email: "jane@company.com" }, count: 5 }),
);
// out.maskedText === '{"user":{"email":"[REDACTED_EMAIL]"},"count":5}'

`maskHTML(html)`

Regex-tokenizes a tiny HTML-ish string and only masks text nodes. Tags, attributes, and the contents of <script> / <style> are preserved verbatim. No DOM parser is used — the file stays dependency-free.

m.maskHTML(`<p>Email: <b>jane@company.com</b></p>`);
// → "<p>Email: <b>[REDACTED_EMAIL]</b></p>"

`maskMarkdown(md)`

Protects fenced code blocks (```) and inline `code` spans, then masks the surrounding prose.

const md =
  "Reach me: jane@company.com\n\n```\nconst e = 'jane@company.com';\n```";
m.maskMarkdown(md).maskedText;
// prose email is masked; the code block keeps the literal email

Streaming large inputs

maskStream() returns a Node.js stream.Transform so you can mask multi-GB logs / files without loading them into memory.

import { createReadStream, createWriteStream } from "fs";
import createPIIMasker from "pii-guard-node-mini";

const m = createPIIMasker({ preset: "balanced" });

createReadStream("input.log")
  .pipe(m.maskStream({ emitEntities: true }))
  .on("entity", (entities) => {
    // optional per-chunk metrics
  })
  .pipe(createWriteStream("masked.log"));

Options:

bufferBoundary (default: /[\n\r.!?]\s/) — regex used to find a safe split point so detections don't get cut across chunks.
maxBufferSize (default: 64 KiB) — hard cap before the stream forces a flush.
emitEntities — emit 'entity' events for observability.

Throws on non-Node environments where the stream module is unavailable.

Express / Connect middleware

createMiddleware() returns a standard (req, res, next) handler.

import express from "express";
import createPIIMasker from "pii-guard-node-mini";

const m = createPIIMasker({ preset: "balanced" });

const app = express();
app.use(express.json());

app.use(
  m.createMiddleware({
    fields: ["body", "query", "params"], // default
    responseBody: true, // mask outgoing res.json bodies
    attachReport: true, // res.locals.piiReport
    skip: (req) => req.path === "/healthz",
  }),
);

app.post("/log", (req, res) => {
  // req.body is already masked here
  // res.locals.piiReport contains { entityCount, byType }
  res.json({ ok: true });
});

Risk scoring

riskScore(text) returns a 0–100 score and a qualitative bucket. Useful as a pre-LLM gate, a router signal, or for compliance dashboards.

const r = m.riskScore("SSN: 123-45-6789 and card 4111 1111 1111 1111");
// r.score      → e.g. 35
// r.level      → "high"      ("low" | "medium" | "high" | "critical")
// r.breakdown  → { SSN: 22.5, CREDIT_CARD: 25.0 }
// r.dominantType → "CREDIT_CARD"
// r.entityCount  → 2

Tune weights with riskWeights:

const m = createPIIMasker({
  riskWeights: { [PIIType.EMAIL]: 12 },
});

Explain & diff

console.log(m.explain("Email jane@company.com or call +1 555 123 4567"));
// Detected 2 entities:
//   - [EMAIL] at pos 6-22 (confidence: 1.00) → j**e@c*****y.com
//   - [PHONE] at pos 31-46 (confidence: 0.95) → ************4567

m.diff("Email jane@company.com");
// [{ start: 6, end: 22, original: "jane@company.com", masked: "...", type: "EMAIL" }]

`sanitizeForLog(value)`

Convenience wrapper — masks a string, deep-masks an object, passes primitives through.

logger.info({ payload: m.sanitizeForLog(req.body) });

Vault TTL

When using TOKENIZE with enableReversibility: true, you can auto-expire entries via vaultTTL (ms). Expired tokens are lazily evicted on the next unmask(), tokenize(), or explicit pruneVault() call.

const m = createPIIMasker({
  enableReversibility: true,
  defaultStrategy: MaskingStrategy.TOKENIZE,
  vaultTTL: 5 * 60 * 1000, // 5 minutes
});

m.mask("Call me at +1 555 123 4567");
// ... 6 minutes later ...
m.pruneVault(); // → number evicted
m.unmask(token); // returns token unchanged because it expired

Audit log

Set auditLog: true to keep an in-memory record of every mask() / maskObject() / detect() call. Disabled by default to avoid memory growth on long-running processes.

const m = createPIIMasker({ auditLog: true });
m.mask("Email: jane@company.com");

m.getAuditLog();
// [
//   { timestamp, operation: 'mask', inputLength, entityCount, byType, warnings? },
//   ...
// ]

m.clearAuditLog();

Multi-locale

Run locale-aware detectors for several jurisdictions in the same instance:

const m = createPIIMasker({
  locales: ["US", "IN", "UK"],
  detectorsEnabled: new Set([
    PIIType.SSN,
    PIIType.AADHAAR,
    PIIType.UK_NIN,
    PIIType.UK_POSTCODE,
  ]),
});

The legacy single-locale locale: 'US' field still works (and is used when locales is unset).

Pattern overrides

Replace a built-in regex without writing a full detector:

const m = createPIIMasker({
  patternOverrides: {
    EMAIL: /[\w.+-]+@(?:secret|internal)\.co\b/g,
  },
  detectorsEnabled: new Set([PIIType.EMAIL]),
});

Note: overrides mutate the shared PATTERNS table at engine construction — last writer wins across instances. For per-instance isolation, register a custom detector instead.

License

MIT License — see LICENSE.