npm.io
0.1.1 • Published 6d agoCLI

scrubtext

Licence
MIT
Version
0.1.1
Deps
0
Size
88 kB
Vulns
0
Weekly
131

scrubtext

All Contributors

Redact secrets and PII from text — emails, credit cards, API keys, JWTs, private keys, and more — with zero dependencies.

CI npm version bundle size types license

Sensitive data leaks through the cracks — into log files, error trackers, analytics events, and increasingly into LLM prompts. scrubtext is a tiny, dependency-free library that finds and removes secrets and PII before that happens.

import { redact } from "scrubtext";

redact("Charge card 4242 4242 4242 4242 for jane@acme.com");
// → "Charge card [CREDIT_CARD] for [EMAIL]"

redact("AWS key AKIAIOSFODNN7EXAMPLE leaked", { strategy: "mask" });
// → "AWS key ******************** leaked"

Why scrubtext?

  • Zero dependencies. Pure regex + validators. Runs anywhere — Node, edge, browser, Workers — with no native bindings and no cold-start penalty.
  • Low false positives. Credit cards are checked with the Luhn algorithm, IPv4 octets are range-validated, SSN area numbers are sanity-checked.
  • Built for the LLM era. Scrub user input and tool output before it reaches a model, or before model output reaches your logs.
  • Extensible. Add your own detectors (employee IDs, internal URLs, anything regexable) and choose how matches are replaced.
  • ESM + CJS + types, plus a CLI for shell pipelines.

What it detects

Type Notes
email
credit_card 13–19 digits, Luhn-validated
ssn US format, invalid area numbers rejected
phone International and US formats
ipv4 / ipv6 IPv4 octets range-checked
mac_address
jwt header.payload.signature
aws_access_key AKIA…, ASIA…, etc.
github_token ghp_, gho_, ghu_, ghs_, ghr_
openai_key sk-…, sk-proj-…
slack_token xoxb-, xoxp-, …
private_key PEM blocks (RSA / EC / OpenSSH / PGP / DSA)
url_credentials scheme://user:pass@host

Install

npm install scrubtext
# or: pnpm add scrubtext  /  yarn add scrubtext  /  bun add scrubtext

API

redact(text, options?): string

Return text with every finding replaced. The default strategy is a [LABEL] tag.

redact("ssh in as root:hunter2@10.0.0.1");
// → "ssh in as [URL_CREDENTIALS]"   (note: scheme required for URL creds)
redactWithReport(text, options?): { text, findings }

Same as redact, but also returns what was removed — handy for audit logs and metrics.

const { text, findings } = redactWithReport(input);
metrics.increment("pii.redacted", findings.length);
findSecrets(text, options?): Finding[]

Scan without modifying. Returns findings sorted by position, with overlaps resolved (a JWT is never also reported as a generic token).

findSecrets("card 4242 4242 4242 4242");
// → [{ type: "credit_card", label: "CREDIT_CARD", value: "...", start: 5, end: 24 }]

Replacement strategies

Strategy Result for 4242 4242 4242 4242
"label" (default) [CREDIT_CARD]
"mask" *******************
"partial" ***************4242 (keeps last 4)
(finding) => … anything you return
redact(input, { strategy: "partial", keepLast: 4, maskChar: "" });
redact(input, { strategy: (f) => `[redacted:${f.type}]` });

Custom detectors & allowlists

import { redact, defaultDetectors } from "scrubtext";

redact(text, {
  // Add to the built-ins:
  extraDetectors: [
    { type: "employee_id", label: "EMPLOYEE_ID", pattern: /\bEMP-\d{5}\b/g },
  ],
  // Never touch known-safe values:
  allowlist: ["support@yourcompany.com"],
});

// …or replace the built-in set entirely:
redact(text, { detectors: defaultDetectors.filter((d) => d.type !== "phone") });

A detector is just:

interface Detector {
  type: string;
  label: string;
  pattern: RegExp;            // must be global (g flag)
  validate?: (match: string) => boolean; // optional false-positive filter
}

CLI

# Redact (reads stdin or a file)
cat app.log | scrubtext
scrubtext --strategy partial secrets.env

# Scan only — list findings, don't modify
cat app.log | scrubtext scan
cat app.log | scrubtext scan --json
scrubtext [file]            Redact text (stdin if no file)
scrubtext scan [file]       List findings without modifying
  --strategy, -s <s>   label | mask | partial   (default: label)
  --keep-last <n>      Trailing chars kept by "partial"
  --mask-char <c>      Character used by mask/partial
  --json               (scan only) emit findings as JSON

A note on guarantees

Regex-based redaction is a strong, fast first line of defence, not a formal guarantee. Free-form names, postal addresses, and novel secret formats can slip through. For regulated workloads, combine scrubtext with review and the extraDetectors hook for your domain-specific identifiers.

Contributors

This project follows the all-contributors specification. Contributions of any kind are welcome — code, docs, bug reports, ideas, reviews! See the emoji key for how each contribution is recognized, and open a PR or issue to get involved.

Thanks goes to these wonderful people:

Tung Tran
Tung Tran

License

MIT Tung Tran

Keywords