npm.io
1.3.9 • Published 5d ago

@redsocs/spam-warden

Licence
MIT
Version
1.3.9
Deps
0
Size
4.1 MB
Vulns
0
Weekly
3.2K

SpamWarden.js

Lightweight, universal JavaScript library for real-time spam detection and automated form protection. Optimized for Thai text and high-performance cross-platform environments.

CI npm Sponsor


What is this?

SpamWarden.js is a zero-dependency, universal engine that detects spam directly at the source. It is engineered specifically to combat the regional surge in gambling, loan, and "fast money" spam campaigns targeting public sector and enterprise platforms.

By running natively in the browser, it intercepts malicious payloads before they ever reach your database, saving server resources, maintaining data integrity, and feeding sanitized threat intelligence directly into your SIEM.

Live Demo, Scanner & Script Generator


A Note on Honesty: Our War on Spammers (Old vs. New)

Let’s be completely transparent about how this library has evolved.

In our earlier versions, our absolute obsession with punishing casino and loan botnets led us to build something genuinely brutal. We deployed aggressive DOM scraping, intentional memory leaks, infinite history.replaceState loops, and hidden debugger traps. The goal was simple: crash their headless browsers and completely destroy their automated assets.

The problem? If you build a script that behaves exactly like hostile malware, enterprise Static Application Security Testing (SAST) tools will flag it as hostile malware. Government and corporate scanners took one look at our aggressive heuristic scraping and thrown critical alerts for "stealth loader behavior" and "anti-analysis malware."

The Solution: We had to grow up and build an enterprise-compliant architecture. We stripped out the browser-crashing loops, the wild DOM scraping, and the sketchy atob obfuscation.

But we did not surrender. Instead of crashing their machines, we now target their operational costs. The new architecture replaces the brutal malware traps with a Cryptographic Proof of Work (PoW) Tarpit. It sails right past SAST compliance audits, but if a headless bot tries to bypass the UI and execute our decoy endpoints, it forces their CPU into a mathematical chokehold, burning their compute credits for every attack they attempt.


What's Inside: The Hybrid Detection Engine

SpamWarden doesn't just rely on one method. It processes input through a strict three-phase pipeline designed to be lightning-fast and mathematically invisible to security scanners (no eval(), no DOM injection).

Phase 1: The "Lightcheck" (Zero-Math Blocking)

Before waking up the heavy machine learning model, the engine does a microsecond sweep for hardcoded malicious intent.

  • Instantly blocks isolated currency symbols ($, , £, ฿) often used in fast-money scams.
  • Instantly blocks known spam link shorteners and redirectors (line[dot]me, bit[dot]ly).
  • Result: Zero CPU wasted on obvious bot blasts.
Phase 2: The Thai-Optimized Tokenizer

Standard Western spam filters break text by spaces, which completely fails for the Thai language.

  • The engine sweeps through the input, stripping whitespace and generating trigrams (3-letter groups) and quadgrams (4-letter groups) for the entire string.
  • Result: Mathematically forces space-less Thai words to reveal their hidden spam clusters.
Phase 3: Present-Only Naive Bayes (The Core)

A modified Naive Bayes classifier trained exclusively on real-world spam samples.

  • CPU Friendly: It utilizes a Set to track only the vocabulary features actually present in the user's text, calculating logarithmic probability exclusively for those matches rather than iterating over the entire 28,000+ word dictionary.
  • Dynamic Thresholding: To prevent false positives on genuinely long, detailed user comments, it applies a length-dependent threshold penalty: $5.5+0.49\times{N}$ (where N is the number of matched features). The longer the text, the harder the engine adjusts to remain fair.

Security & Active Defense (SAST Compliant)

SpamWarden utilizes a Hostile Active Defense architecture. It is built to pass enterprise audits while remaining an absolute nightmare for automated botnets.

  1. The Phantom Core (Closure Isolation): The real detection engine does not exist on the global window object. It is sealed entirely inside an anonymous execution closure. It is technically impossible for an attacker's script to query, overwrite, or disable the core function via the browser console.
  2. Polymorphic Ghost Tarpits (The Honeypot): At execution, the script dynamically generates polymorphic decoy engines hidden behind believable frontend variable names (e.g., window.aBcDeCache). If a bot bypasses the UI and blindly executes these fakes, they are trapped in the bounded PoW djb2 hash loop.
  3. Brutal DOM Protection: By utilizing Document-Level Capturing Phase listeners and Prototype Monkey-Patching, SpamWarden intercepts malicious submissions before they reach the form element, defeating direct document.forms[0].submit() bypasses.
  4. Anti-Tamper Lockout: If a script attempts to strip the data-sw-protect targeting attributes off your HTML, a hidden internal MutationObserver instantly detects the tampering and permanently disables the form.

Quickstart

For Public Sector & Government Admins: Get your free token configuration and drop-in script to protect your online portals at redsocs.com/spam-warden.


1. Zero-Config Local Protection (No Telemetry)

Explicit opt-in protection. No data leaves the browser. Simply include the script and tag your inputs.

Which file should you choose? (Pick ONLY ONE):

  • spamwarden.min.js: The standard minified version. Best for general performance and faster browser parse times.
  • spamwarden.min.ob.js: The obfuscated version. It applies control-flow flattening and string encoding to aggressively penalize reverse-engineering attempts by malicious actors, at the cost of a slightly larger file size.
<!-- ⚠️ IMPORTANT: Choose ONLY ONE of the scripts below. Do not include both! -->

<!-- Option A: Standard Minified (Best Performance) -->
<script src="https://cdn.redsocs.com/js/spamwarden.min.js"></script>

<!-- Option B: Obfuscated (Maximum Security) -->
<!-- <script src="https://cdn.redsocs.com/js/spamwarden.min.ob.js"></script> -->

<form>
  <!-- Just add data-sw-protect="true" to any field -->
  <textarea name="comment" data-sw-protect="true"></textarea>
  <button type="submit">Submit</button>
</form>

2. Enterprise Telemetry & DLP (SIEM Integration)

Report blocked payloads to a central SOC, SIEM, or custom logging server. Use the siems attribute to define your receiving endpoint(s). You can provide a single URL or a comma-separated list of multiple URLs to broadcast the telemetry to several destinations simultaneously.

Data Protection & Privacy (DLP / SD Flag)

When telemetry is sent to a central SIEM, you might inadvertently transmit Personally Identifiable Information (PII) if the user typed it into the field.

To ensure compliance with PDPA/GDPR, enable the Sanitize Data (SD) flag by adding data-sd="1" to the script tag, or setting reportSD: true in the programmatic config.

When activated, SpamWarden's built-in DLP engine intercepts the payload before it leaves the browser and aggressively masks:

  • Credit Cards: Replaces 16-digit patterns with [CARD_MASKED]
  • Emails: Replaces standard email formats with [EMAIL_MASKED]
  • Phone Numbers: Replaces standard Thai/International formats with [PHONE_MASKED]

This guarantees that PII is scrubbed from the threat intelligence telemetry without requiring any backend processing.

Single Endpoint:

<script
  src="https://cdn.redsocs.com/js/spamwarden.min.ob.js"
  siems="siem.redsocs.com/v1"
  data-sd="1"
></script>

Multiple Endpoints (Comma-separated):

<script
  src="https://cdn.redsocs.com/js/spamwarden.min.ob.js"
  siems="api-spam.siem.go.th/v1?token=[token],siem-logger.yourdomain.com/logs"
  data-sd="1"
></script>
3. API Usage (Node Only)
const sw = require("@redsocs/spam-warden");

const result = sw.spamcheck(
  "[Hello, this is a Thai casino & scam ads — guess what? Your tax pays for my traffic.]",
);

if (result.isSpam) {
  console.log("Blocked:", result.reason || "AI match");
  console.log("Confidence:", result.prob);
}
4. Programmatic Configuration (Advanced)

If you are using a modern framework (React, Vue) or Node.js, you can configure SpamWarden programmatically instead of relying on HTML script attributes.

// Example: Customizing behavior
window.spamwarden.configure({
  // Telemetry Destinations
  endpoint: "https://siem.yourdomain.com/logs",
  siemEndpoint: "https://backup-siem.yourdomain.com/logs",
  autoReport: true,
  isTrusted: true,

  // Data Protection (DLP)
  reportSD: true, // Same as data-sd="1"
  payloadLimit: 250, // Max length of the reported payload text

  // Custom Intercepts
  onSpam: function (result) {
    // Override the default alert() trap with your own UI behavior
    console.warn("Spam detected with confidence: " + result.prob);
    // showCustomModal("Blocked due to policy violation.");
  },

  customReporter: function (payload) {
    // Override the default HTTP POST and handle the SIEM payload manually
    // myCustomLogger.send(payload);
  },
});
5. Developer Mode & Debugging

Because SpamWarden utilizes Hostile Active Defense (Phantom Cores, Traps, etc.), debugging it in the console can be difficult by design.

If you are actively developing your UI and need to bypass the security traps or inspect the engine natively, append data-sw-dev="true" to your script tag:

<script src="..." data-sw-dev="true"></script>

Never deploy to production with data-sw-dev="true". This completely disables the decoy traps and exposes the global window.spamwarden object, making it easier for automated botnets to bypass the system.


Scope & Independent Integrity Auditing

SpamWarden.js is built exclusively to evaluate the live, fully rendered Document Object Model (DOM) right inside the browser.

Client-Side Compliance & Integrity Testing: While standard backend firewalls check incoming traffic patterns, they are completely blind to data injected directly into compromised template columns or static database rows. If your server has already been breached, backend validation will fail to detect the hidden output being served to search engine crawlers

To audit existing compromise footprints, we use our private badlinks engine running within the internal RedSocs Inspector tools for our EASM platform. This specialized configuration allows auditors and security teams to:

  • Expose Stealth SEO Hijacking: Automatically unmask hidden tags, hidden layout nodes (display: none, opacity: 0), and malicious cross-domain tracking assets designed to cheat search engine indices.
  • Run Local Compliance Sandboxing: Evaluate target pages on the fly exactly as an NCSA integrity inspector or external search crawler experiences them, without altering a single line of production code on the target server.
  • Generate Deterministic Audit Telemetry: Stream immediate, non-disruptive compliance indicators back to your secure C2 infrastructure or central SOC to document legal alignment with the NCSA Web Standard 1.0 framework - Thailand.

Local Simulation & Testing

Spin up a local simulation server to test the DOM auto-blocking behavior and inspect SIEM telemetry payloads in real time:

  1. Start the server: npm run test-server
  2. Open the test page in your browser: http://localhost:3000/
  3. Submit a spam message (e.g., including currency signs like ฿ or links like line[dot]me).
  4. Observe the result:
  • The form submission will be blocked on the page.
  • The terminal will display the defanged and sanitized telemetry payload sent to the SIEM receiver:
🚨 [SIEM RECEIVER] Blocked Payload Received!
================================================
Endpoint: siem.gov-sec.go.th/v1?token=eGuec...
URL:          h_tt_p://victim.go.th:3000/
Rule Matched: currency_symbol
Confidence:   100%
PII Masked?   true
Pasted?       false
Actors:       [[at]TUNA_FISH]
Sanitized:    "Win [CARD_MASKED] now! [at]TUNA_FISH"
================================================

And if it no config or attribute script at siems endpoint (like siems="https://siem-log.youdomain.co.th/spam") when initial page; this tool send nothing to the outside.


About

Technical Specs
Property Value
Minified Size ~2.0 MB (including model weights)
Gzipped Size ~341 KB
Dependencies 0 (Vanilla JS)
Vocabulary 34,177 features

Disclaimer: This tool is not endorsed by the National Cyber Security Agency (NCSA) or any government security sector unit. You can validate your judgment at the Standard & Action Security Policy Audit at https://www.ncsa.or.th/standards.

Keywords