npm.io
1.0.3 • Published 3d ago

litesearch-engine

Licence
MIT
Version
1.0.3
Deps
0
Size
199 kB
Vulns
0
Weekly
0

litesearch-engine

Zero-dependency, blazing-fast, in-memory full-text search engine for Node.js and TypeScript — 100% dynamic, domain-agnostic.

Built to replace Elasticsearch for datasets of up to ~50,000 documents where you need speed, simplicity, and full control — no Docker, no JVM, no DevOps. Search completes in < 15ms for 10,000 documents.

Any data shape, any use case. Products, blog posts, user profiles, support tickets, log entries, code snippets, recipes, messages — if it's a JSON object with string fields, litesearch indexes and searches it. No schema, no setup, no domain lock-in.

npm install litesearch-engine

Features

Feature Details
100% dynamic schema Works with any document shape — products, posts, users, tickets, logs, anything
Full-text search BM25+ scoring (the same algorithm powering Elasticsearch/Lucene)
Fuzzy / typo tolerance Levenshtein distance with adaptive thresholds and early-exit optimisation
Partial matching Any prefix of any word matches instantly
Autocomplete suggestions Trie prefix tree, < 1ms per query
Nested filters AND / OR / NOT with 10 operators
Highlighted snippets <mark> tags with match context window
Live index updates add / update / remove in real time, no re-index needed
Domain-agnostic No schemas, no models, no setup — index anything
TypeScript-first Full generics, every input/output typed
Zero dependencies Pure TypeScript, 0 npm dependencies
Browse & list browse(), getById(), has() — no query string needed
Sort Sort by any field (string, number, date), in search or browse
Facets / aggregations terms, range, date_histogram — computed over filtered sets
Multi-index manager LiteSearchManager with cross-index searchAll, weighted merging
Export / Import serialize/deserialize + optional file persistence

Quick Start

import { LiteSearch } from "litesearch-engine";

// 1. Define your document type — any shape works
interface BlogPost {
  id: string;
  title: string;
  body: string;
  tags: string[];
  author: string;
}

// 2. Create the engine — no schema, just point at fields
const engine = new LiteSearch<BlogPost>({
  idField: "id",
  fields: {
    title:  { weight: 3, suggest: true },
    body:   { weight: 1 },
    tags:   { weight: 1.5 },
    author: { weight: 2, suggest: true },
  },
});

// 3. Index your data
engine.addMany(posts);

// 4. Search — BM25 scoring, fuzzy matching, all automatic
const result = engine.search("typescript performance");
console.log(result.hits[0].document.title); // best match
console.log(result.took);                    //2 (ms)

Table of Contents

  1. Installation
  2. Universal Usage
  3. Configuration
  4. Indexing Documents
  5. Searching
  6. Filters
  7. Autocomplete / Suggestions
  8. Highlights
  9. Live Index Updates
  10. Pagination
  11. Stats
  12. Advanced: Custom Tokenizer
  13. Advanced: Custom Scoring
  14. Output Format Reference
  15. Performance Guide
  16. Architecture Deep Dive

Installation

npm install litesearch-engine
# or
yarn add litesearch-engine
# or
pnpm add litesearch-engine

Requirements: Node.js 16+, TypeScript 4.7+ (if using TypeScript).


Universal Usage

litesearch-engine ingests any JSON document shape — no schema, no setup, no config beyond pointing at which fields to index. Every feature below works with any domain: products, users, blog posts, legal cases, recipes, support tickets, logs, you name it.

1. User Directory
import { LiteSearch } from "litesearch-engine";

interface User {
  id: string;
  name: string;
  email: string;
  department: string;
  bio: string;
}

const users = new LiteSearch<User>({
  idField: "id",
  fields: {
    name:       { weight: 3, suggest: true },
    email:      { weight: 2 },
    department: { weight: 1, suggest: true },
    bio:        { weight: 1 },
  },
});

users.addMany([
  { id: "1", name: "Chiamaka Obi",   email: "chiamaka@example.com", department: "Engineering", bio: "Full-stack developer" },
  { id: "2", name: "Kofi Mensah",    email: "kofi@example.com",     department: "Design",      bio: "UX designer" },
  { id: "3", name: "Aisha Bello",    email: "aisha@example.com",    department: "Marketing",   bio: "Content strategist" },
]);

// Full-text search — BM25 scoring, fuzzy, prefix, all automatic
const r = users.search("chiamaka dev");
console.log(r.hits[0].document.name); // "Chiamaka Obi"

// Autocomplete — trie prefix lookup < 1ms
const s = users.suggest("chi");
console.log(s.suggestions[0].text); // "chiamaka"

// Browse all — with filter + sort
const engineering = users.browse({
  filter: { field: "department", operator: "eq", value: "Engineering" },
  sort:   { field: "name", direction: "asc" },
});

// Faceted navigation — department counts
const faceted = users.search("", {
  facets: { department: { type: "terms", size: 5 } },
});
console.log(faceted.facets!.department.buckets);
// → [{ key: "Engineering", count: 1 }, { key: "Design", count: 1 }, ...]
interface LegalCase {
  caseNumber: string;
  title: string;
  summary: string;
  jurisdiction: string;
  year: number;
}

const cases = new LiteSearch<LegalCase>({
  idField: "caseNumber",         // custom idField
  fields: {
    title:        { weight: 3, suggest: true },
    summary:      { weight: 2 },
    jurisdiction: { weight: 1 },
  },
});

cases.addMany([
  { caseNumber: "SC/1/2024", title: "Maga v. INEC",      summary: "Electoral dispute",        jurisdiction: "Supreme Court", year: 2024 },
  { caseNumber: "CA/45/2023", title: "Bello v. State",    summary: "Criminal appeal",           jurisdiction: "Court of Appeal", year: 2023 },
  { caseNumber: "HC/12/2022", title: "Okafor v. UBA Plc", summary: "Banking and contract law",  jurisdiction: "High Court", year: 2022 },
]);

// Fuzzy finds typos — "electral" matches "Electoral"
const result = cases.search("electral dispuite", { fuzzy: { enabled: true } });

// Filter by jurisdiction + year range
result = cases.search("appeal", {
  filter: {
    AND: [
      { field: "jurisdiction", operator: "eq", value: "Court of Appeal" },
      { field: "year", operator: "gte", value: 2020 },
    ],
  },
});

// Exact ID lookup
const c = cases.getById("SC/1/2024");

// Existence check
if (cases.has("HC/12/2022")) { /* ... */ }
3. Recipe Collection
interface Recipe {
  id: string;
  name: string;
  ingredients: string[];
  cuisine: string;
  prepTime: number; // minutes
  instructions: string;
}

// Array fields are auto-joined: ingredients: ["rice", "beans"] → "rice beans"
const recipes = new LiteSearch<Recipe>({
  fields: {
    name:         { weight: 3, suggest: true },
    ingredients:  { weight: 2 },
    cuisine:      { weight: 1, suggest: true },
    instructions: { weight: 1 },
  },
});

recipes.addMany([
  { id: "1", name: "Jollof Rice",      ingredients: ["rice", "tomatoes", "pepper", "onions"], cuisine: "West African", prepTime: 60, instructions: "..." },
  { id: "2", name: "Egusi Soup",       ingredients: ["egusi", "pumpkin leaves", "palm oil"],   cuisine: "Nigerian",    prepTime: 90, instructions: "..." },
  { id: "3", name: "Yam Porridge",     ingredients: ["yam", "palm oil", "fish"],               cuisine: "Nigerian",    prepTime: 45, instructions: "..." },
  { id: "4", name: "Pad Thai",         ingredients: ["rice noodles", "shrimp", "peanuts"],     cuisine: "Thai",        prepTime: 30, instructions: "..." },
]);

// Search by ingredient — "rice" finds Jollof Rice, Yam Porridge, Pad Thai
const riceDishes = recipes.search("rice");

// Sort by prep time (ascending)
const quickMeals = recipes.search("", { sort: { field: "prepTime", direction: "asc", type: "number" } });

// Facet by cuisine
const byCuisine = recipes.search("", {
  facets: { cuisine: { type: "terms", size: 10 } },
});

// Browse with pagination (20 per page)
const page2 = recipes.browse({ limit: 20, offset: 20 });
4. Job Board
interface Job {
  _id: string;
  title: string;
  skills: string[];
  location: string;
  salaryMin: number;
  salaryMax: number;
}

// Use idResolver for non-standard IDs — here _id is already a string
const jobs = new LiteSearch<Job>({
  idField: "_id",     // direct mapping
  fields: {
    title:    { weight: 3, suggest: true },
    skills:   { weight: 2 },
    location: { weight: 1, suggest: true },
  },
  tokenizer: {
    language: "none",  // keep all tokens — "React" and "react" both exist
    normalizer: (t) => t.toLowerCase(),  // case-insensitive searching
  },
});

jobs.addMany([
  { _id: "1", title: "Senior React Engineer", skills: ["React", "TypeScript", "Node.js"], location: "Lagos", salaryMin: 8000000, salaryMax: 15000000 },
  { _id: "2", title: "UX Designer",           skills: ["Figma", "User Research"],          location: "Remote", salaryMin: 5000000, salaryMax: 10000000 },
  { _id: "3", title: "DevOps Lead",           skills: ["AWS", "Kubernetes", "Terraform"],  location: "Nairobi", salaryMin: 12000000, salaryMax: 20000000 },
]);

// Range filter on salary
const seniorRoles = jobs.search("senior", {
  filter: {
    AND: [
      { field: "salaryMin", operator: "gte", value: 5000000 },
      { field: "salaryMax", operator: "lte", value: 15000000 },
    ],
  },
  sort: { field: "salaryMin", direction: "desc", type: "number" },
});
5. Multi-Index Manager — Cross-Search Users, Articles & Products
import { LiteSearch, LiteSearchManager } from "litesearch-engine";

const manager = new LiteSearchManager();

manager.createIndex("users", {
  fields: { name: { weight: 3, suggest: true }, bio: { weight: 1 } },
});
manager.createIndex("articles", {
  fields: { title: { weight: 3, suggest: true }, body: { weight: 1 } },
});
manager.createIndex("products", {
  fields: { name: { weight: 3, suggest: true }, description: { weight: 1 } },
});

manager.add("users",    { id: "1", name: "Kofi Mensah", bio: "UX designer" });
manager.add("articles", { id: "1", title: "Design Systems", body: "How to build scalable design systems" });
manager.add("products", { id: "1", name: "Wireframe Kit", description: "UI wireframe components for Figma" });

// Single-index search
const userResult = manager.search("users", "kofi");

// Cross-index search — merged, ranked, tagged
const all = manager.searchAll("design", {
  indexes: { users: 1.0, articles: 1.5, products: 1.0 }, // per-index weight
  limit: 20,
});

console.log(all.hits[0].document._index); // "articles" (highest weight × matched)
console.log(all.perIndex);
// → { users: { total: 1, took: 2 }, articles: { total: 1, took: 3 }, products: { total: 1, took: 2 } }
Nested Documents & Custom ID Resolver

Documents with nested objects work via dot-path fields. MongoDB-style _id ObjectIds work via idResolver:

// A document with nested address and an ObjectId-style _id
interface Customer {
  _id: { toString: () => string };
  name: string;
  address: { city: string; state: string };
  tags: Array<{ name: string }>;
}

const customers = new LiteSearch<Customer>({
  idField: "_id",                                       // dot-path NOT needed for top-level
  idResolver: (doc) => (doc as any)._id.toString(),      // extract string from ObjectId
  fields: {
    name:          { weight: 3, suggest: true },
    "address.city":  { weight: 2, path: "address.city" },  // dot-path to nested value
    "address.state": { weight: 1, path: "address.state" },
    tags:          { weight: 2 },                           // arrays of objects are auto-flattened
  },
});

customers.add({
  _id: { toString: () => "cust_001" },
  name: "Amara Okafor",
  address: { city: "Lagos", state: "Lagos" },
  tags: [{ name: "vip" }, { name: "wholesale" }],
});

// All of these find the document:
customers.search("amara");
customers.search("lagos");     // matches address.city + address.state
customers.search("vip");       // matches flattened tags array
customers.getById("cust_001"); // ✓ works because idResolver maps _id → string
Custom Field Extraction

Use FieldConfig.extract to index computed values that don't exist on the raw document:

const engine = new LiteSearch({
  fields: {
    "fullName": { weight: 3, extract: (doc) => `${doc.firstName} ${doc.lastName}` },
  },
});

engine.add({ id: "1", firstName: "Chiamaka", lastName: "Obi" });
engine.search("chiamaka obi"); // ✓ matches from computed "fullName"
Persistence: Save & Restore
import { serialize, deserialize, saveToFile, loadFromFile } from "litesearch-engine";

// Serialize to JSON string
const json = serialize(engine);

// Restore from JSON
const restored = deserialize(json, { fields: { name: { weight: 3 } } });

// Node.js file persistence (browser-safe — throws if fs not available)
await saveToFile(engine, "./search-index.json");
const fromDisk = await loadFromFile("./search-index.json", { fields: { name: { weight: 3 } } });

Configuration

const engine = new LiteSearch<YourDoc>({
  // ── Required ──────────────────────────────────────────────────────────────

  /**
   * The field on your document that uniquely identifies it.
   * Default: "id"
   */
  idField: "id",

  /**
   * Fields to index. Pass as an array (all default config) or an object
   * (per-field control).
   *
   * Short form:
   */
  fields: ["name", "description", "category"],

  // Or long form with per-field config:
  fields: {
    name:        { weight: 3,   suggest: true  },
    description: { weight: 1,   suggest: false },
    category:    { weight: 2,   suggest: true  },
    brand:       { weight: 2.5, suggest: true  },
    tags:        { weight: 1.5, suggest: false },
  },

  // ── Optional ──────────────────────────────────────────────────────────────

  fuzzy: {
    enabled:     true,  // Toggle fuzzy globally
    maxDistance: 2,     // Max Levenshtein edit distance (1 or 2 recommended)
    minLength:   4,     // Minimum query word length before fuzzy activates
  },

  scoring: {
    k1: 1.2,  // BM25 term frequency saturation (1.2 = standard)
    b:  0.75, // BM25 field length normalisation (0.75 = standard)
  },

  suggest: {
    maxResults:    10,   // Max autocomplete suggestions returned
    caseSensitive: false,
  },

  tokenizer: {
    language: "en",  // "en" strips English stopwords, "none" keeps all tokens
  },
});
Field Config Options
Option Type Default Description
weight number 1.0 Score multiplier. Name matches should outweigh description matches.
suggest boolean true Whether this field feeds the autocomplete trie.
fuzzy boolean true Whether fuzzy matching applies to this field.
path string field name Dot-path for nested objects: "meta.brand" reads doc.meta.brand.

Indexing Documents

Single document
engine.add({
  id: "prod_001",
  name: "Nike Air Max 270",
  description: "Lightweight running shoe for men",
  category: "Footwear",
  brand: "Nike",
  price: 45000,
});
// Internally still calls add() per document, but in a tight loop.
// For 10,000 docs this typically takes 100–300ms.
engine.addMany(products);
Nested document fields

Works automatically via the path field config:

const engine = new LiteSearch({
  fields: {
    title:       { weight: 3 },
    "meta.brand": { weight: 2, path: "meta.brand" }, // reads doc.meta.brand
  },
});

engine.add({
  id: "1",
  title: "Ankara Dress",
  meta: { brand: "Adire Collective", tags: ["fashion"] },
});
Array fields

Arrays are automatically joined with spaces before tokenizing:

engine.add({
  id: "1",
  tags: ["running", "outdoor", "men"], // indexed as "running outdoor men"
});

Searching

const result = engine.search("running shoes nike", {
  limit:      10,       // Results per page. Default: 10
  offset:     0,        // Pagination offset. Default: 0
  highlight:  true,     // Return <mark> snippets. Default: true
  minScore:   0.1,      // Drop results below this normalised score (0–1)
  boostExact: true,     // Boost exact phrase matches to top. Default: true
  fields:     ["name", "brand"], // Search only these fields (subset)
  filter: {             // Optional filter (see Filters section)
    AND: [
      { field: "category", operator: "eq", value: "Footwear" },
      { field: "price", operator: "lte", value: 60000 },
    ]
  },
});
Result shape
{
  hits: [
    {
      document:  { id: "1", name: "Nike Air Max 270", ... }, // original doc
      score:     0.97,       // normalised relevance (0–1)
      rawScore:  4.82,       // raw BM25 score
      matchType: "exact",    // "exact" | "prefix" | "fuzzy"
      highlights: [
        {
          field:         "name",
          snippet:       "…<mark>Nike</mark> <mark>Air</mark> Max 270…",
          matchedTokens: ["nike", "air"],
        },
        {
          field:         "description",
          snippet:       "Lightweight <mark>running</mark> <mark>shoes</mark> for men",
          matchedTokens: ["running", "shoes"],
        }
      ]
    },
    // ...more hits
  ],
  total:      47,       // total matching docs (before pagination)
  took:       3,        // milliseconds
  query:      "running shoes nike",
  pagination: {
    limit:   10,
    offset:  0,
    hasMore: true,
  }
}

Filters

Filters can be simple clauses or deeply nested AND/OR/NOT groups.

Simple clause
engine.search("phone", {
  filter: { field: "category", operator: "eq", value: "Electronics" }
});
Available operators
Operator Description Example value
eq Equals "Electronics"
neq Not equals "Draft"
gt Greater than 50000
gte Greater than or equal 50000
lt Less than 100000
lte Less than or equal 100000
range Between (inclusive) [10000, 50000]
in Value is in list ["Nike", "Adidas"]
nin Value is NOT in list ["Draft", "Archived"]
contains String contains (case-insensitive) "max"
startsWith String starts with (case-insensitive) "Nike"
exists Field is not null/undefined (no value needed, pass true)
Compound filters
// Products in Electronics, priced ₦100k–₦500k, not out-of-stock
engine.search("laptop", {
  filter: {
    AND: [
      { field: "category",  operator: "eq",    value: "Electronics" },
      { field: "price",     operator: "range", value: [100000, 500000] },
      { field: "inStock",   operator: "eq",    value: true },
    ]
  }
});
// Either Nike or Adidas, under ₦50k
engine.search("shoes", {
  filter: {
    AND: [
      {
        OR: [
          { field: "brand", operator: "eq", value: "Nike" },
          { field: "brand", operator: "eq", value: "Adidas" },
        ]
      },
      { field: "price", operator: "lt", value: 50000 },
    ]
  }
});
// Anything BUT the "Food" category
engine.search("noodles", {
  filter: {
    NOT: { field: "category", operator: "eq", value: "Food" }
  }
});
Nested field filters

Use dot-path notation — works the same as field indexing:

{ field: "meta.brand", operator: "eq", value: "Nike" }

Autocomplete / Suggestions

const result = engine.suggest("nikee"); // typo
// {
//   suggestions: [
//     { text: "nike",    documentIds: ["1","9"], frequency: 2, matchType: "fuzzy",  distance: 1 },
//     { text: "nikelab", documentIds: ["3"],     frequency: 1, matchType: "fuzzy",  distance: 2 },
//   ],
//   took: 1,
//   query: "nikee"
// }

// Perfect prefix match
engine.suggest("run");
// → ["running", "runway", ...] matched by trie prefix in < 1ms
How it works
  1. Trie prefix lookup — O(prefix length), always checked first.
  2. Fuzzy fallback — Only if prefix returns < 3 results. Scans all trie words with Levenshtein distance ≤ adaptive threshold.
  3. Rankingexact > prefix > fuzzy, then by frequency (how many docs contain the term).
Suggestion result shape
{
  text:        "running",
  documentIds: ["1", "2", "15"], // which docs contain this word
  frequency:   3,                // how many times indexed
  matchType:   "prefix",         // "exact" | "prefix" | "fuzzy"
  distance:    0,                // Levenshtein distance from query
}

Highlights

Highlights are returned by default on every search. Disable them for performance-critical paths where you don't need them:

engine.search("nike shoes", { highlight: false });

The snippet:

  • Finds the position of the first match in the field value
  • Returns a ±30 character context window (max 160 chars)
  • Wraps matched tokens in <mark>…</mark>
  • Adds ellipsis when the value is truncated
// Input:  "Lightweight running shoes designed for men with narrow feet"
// Query:  "running shoes"
// Output: "Lightweight <mark>running</mark> <mark>shoes</mark> designed for men…"

You can render highlights directly in HTML, or strip the <mark> tags for plain text:

const plain = hit.highlights[0].snippet.replace(/<\/?mark>/g, "");
// → "Lightweight running shoes designed for men…"

Live Index Updates

The index updates instantly — no rebuild required.

// Add a new document → immediately searchable
engine.add({ id: "99", name: "New Arrival", ... });

// Update an existing document (same ID = upsert)
engine.update({ id: "99", name: "New Arrival - Updated", ... });

// Remove a document
engine.remove("99");

// Wipe the entire index
engine.clear();
Integrating with your database
// With Mongoose / MongoDB
Product.watch().on("change", (change) => {
  if (change.operationType === "insert")  engine.add(change.fullDocument);
  if (change.operationType === "update")  engine.update(change.fullDocument);
  if (change.operationType === "delete")  engine.remove(change.documentKey._id.toString());
});

// With Prisma / PostgreSQL
// After any product save:
await prisma.product.update({ ... });
engine.update(updatedProduct);

// After delete:
await prisma.product.delete({ where: { id } });
engine.remove(id);
Seeding on startup
// server.ts
const products = await Product.find({}).lean(); // or prisma.product.findMany()
engine.addMany(products);
console.log(`Search index ready: ${engine.stats().documentCount} documents`);

Pagination

// Page 1
const page1 = engine.search("phone", { limit: 10, offset: 0 });

// Page 2
const page2 = engine.search("phone", { limit: 10, offset: 10 });

// Check if more pages exist
if (page1.pagination.hasMore) {
  // fetch next page
}

// Total results (for "Showing X of Y results")
console.log(`Showing ${page1.hits.length} of ${page1.total} results`);

Stats

const stats = engine.stats();
// {
//   documentCount:       10000,
//   termCount:           84320,  // unique indexed terms
//   trieNodeCount:       62100,  // autocomplete trie size
//   fields:              ["name", "description", "category", "brand"],
//   memoryEstimateBytes: 15728640, // ~15MB for 10k products
//   lastUpdated:         2024-01-15T10:30:00.000Z,
// }

Advanced: Custom Tokenizer

The default tokenizer: lowercases, splits on non-alphanumeric characters, strips English stopwords (a, the, and, etc.), and drops tokens < 2 characters.

Override it completely:

const engine = new LiteSearch({
  tokenizer: {
    tokenize: (text: string): string[] => {
      // Your own logic — split on hyphens too, for example
      return text
        .toLowerCase()
        .split(/[\s\-_,\.]+/)
        .filter(t => t.length >= 2);
    }
  }
});

Or just change the language setting to keep all tokens (no stopword removal):

tokenizer: { language: "none" }

Advanced: Custom Scoring

Tune BM25 parameters:

Parameter Effect When to change
k1 = 1.2 (default) Controls term-frequency saturation. Higher = longer documents score higher. Increase for long descriptions, decrease for short product names.
b = 0.75 (default) Field-length normalisation. b=1 fully normalises, b=0 ignores length. Decrease if your products have wildly different description lengths.
// Tuned for short product names (less length normalisation)
scoring: { k1: 1.5, b: 0.3 }

// Tuned for long blog posts
scoring: { k1: 1.2, b: 0.9 }

Output Format Reference

SearchResult<T>
interface SearchResult<T> {
  hits:       SearchHit<T>[];
  total:      number;          // total matches (pre-pagination)
  took:       number;          // ms
  query:      string;
  pagination: {
    limit:   number;
    offset:  number;
    hasMore: boolean;
  };
}
SearchHit<T>
interface SearchHit<T> {
  document:   T;               // your original document, untouched
  score:      number;          // 01 normalised relevance
  rawScore:   number;          // raw BM25 score (for debugging)
  matchType:  "exact" | "prefix" | "fuzzy";
  highlights?: HighlightResult[];
}
HighlightResult
interface HighlightResult {
  field:         string;     // which field matched
  snippet:       string;     // context window with <mark> tags
  matchedTokens: string[];   // which tokens matched
}
SuggestResult
interface SuggestResult {
  suggestions: SuggestionHit[];
  took:        number;
  query:       string;
}

interface SuggestionHit {
  text:        string;     // the suggested word
  documentIds: string[];   // which docs contain it
  frequency:   number;     // how many docs (for ranking)
  matchType:   "exact" | "prefix" | "fuzzy";
  distance:    number;     // Levenshtein distance from query
}

Performance Guide

Expected benchmarks
Documents Index time Search (no filter) Search (with filter) Suggest
1,000 ~10ms < 2ms < 3ms < 1ms
10,000 ~80ms < 10ms < 15ms < 2ms
50,000 ~400ms < 50ms < 80ms < 10ms
Tips

1. Only index what you search. Avoid indexing fields you never query. Every extra field increases index time and memory.

// ❌ Don't do this
fields: ["id", "createdAt", "updatedAt", "internalNotes", "name"]

// ✅ Only searchable fields
fields: { name: { weight: 3 }, description: { weight: 1 } }

2. Disable suggest on heavy fields. The trie only needs to index fields your autocomplete uses.

fields: {
  name:        { suggest: true  }, // ✅ autocomplete from names
  description: { suggest: false }, // ❌ skip — too many tokens
}

3. Disable highlighting on search-as-you-type routes. You don't need highlights on every keystroke.

// On keypress
engine.suggest(query); // use suggest(), not search()

// On submit / full search
engine.search(query, { highlight: true });

4. Use minScore to cut noise.

engine.search("laptop bag", { minScore: 0.2 }); // drop weak matches

5. Pre-filter with filters, not post-filter. Filters in litesearch are applied after BM25 scoring but before result assembly. For range/category filters on large datasets, consider structuring your index to use pre-filtered collections if you're approaching 50k+ documents.

6. Seed index at server startup. Don't re-create the engine per request. Create it once and keep it in module scope.

// search.service.ts — module-level singleton
import { LiteSearch } from "litesearch-engine";
export const searchEngine = new LiteSearch({ ... });

Architecture Deep Dive

For contributors and advanced users.

Inverted Index

The core of the engine. Structure:

field → term → Map<docId, positions[]>

Example:
"name" → {
  "nike":    { "doc1": [0], "doc9": [0] },
  "air":     { "doc1": [1] },
  "max":     { "doc1": [2] },
  "running": { "doc1": [0], "doc2": [0], "doc15": [3] },
}

Each token stores its positions (not just presence). Positions enable future phrase-proximity scoring and exact phrase detection.

BM25+ Scoring

For each query term across each field:

score(term, doc, field) =
  IDF(term) × TF_norm(term, doc, field) × field_weight

IDF(term) = log((N - df + 0.5) / (df + 0.5) + 1)
  N  = total documents
  df = documents containing this term

TF_norm = (tf × (k1 + 1)) / (tf + k1 × (1 - b + b × (fieldLen / avgFieldLen)))
  tf       = how many times term appears in this doc's field
  fieldLen = token count of this field in this doc
  avgFieldLen = average across all docs

Final doc score = sum of all term+field scores → normalised to [0, 1].

Levenshtein (Fuzzy)

Uses a two-row dynamic programming table (O(m×n) time, O(n) space) with:

  • Early exit when the minimum possible distance in a row exceeds maxDistance
  • Length pre-filter — skips terms where |len_a - len_b| > maxDistance without running DP
  • Adaptive threshold — terms < 4 chars require exact match; 4–6 chars allow distance 1; 7+ allow distance 2
Trie (Autocomplete)

A character-level prefix tree where each node stores:

  • children: Map<char, TrieNode>
  • docIds: Set<string> — all docs reachable from this prefix
  • frequency: number — for ranking

Prefix lookup is O(prefix length). After reaching the prefix node, a DFS collects all descendant words, sorted by frequency.

Query Pipeline
query string
    ↓ tokenize (lowercase, split, strip stopwords)
    ↓ per-token lookup: exact → prefix → fuzzy
    ↓ BM25 score accumulation per field
    ↓ exact phrase boost (if multi-token)
    ↓ normalise scores to [0, 1]
    ↓ filter (AND/OR/NOT evaluation)
    ↓ sort DESC by score
    ↓ paginate
    ↓ build highlights
    ↓ SearchResult

License

MIT

Keywords