1.0.3 • Published 3d ago

litesearch-engine

Licence

MIT

Version

1.0.3

Deps

Size

199 kB

Vulns

Weekly

Summary Dependency Versions

litesearch-engine

Zero-dependency, blazing-fast, in-memory full-text search engine for Node.js and TypeScript — 100% dynamic, domain-agnostic.

Built to replace Elasticsearch for datasets of up to ~50,000 documents where you need speed, simplicity, and full control — no Docker, no JVM, no DevOps. Search completes in < 15ms for 10,000 documents.

Any data shape, any use case. Products, blog posts, user profiles, support tickets, log entries, code snippets, recipes, messages — if it's a JSON object with string fields, litesearch indexes and searches it. No schema, no setup, no domain lock-in.

npm install litesearch-engine

Features

Feature	Details
100% dynamic schema	Works with any document shape — products, posts, users, tickets, logs, anything
Full-text search	BM25+ scoring (the same algorithm powering Elasticsearch/Lucene)
Fuzzy / typo tolerance	Levenshtein distance with adaptive thresholds and early-exit optimisation
Partial matching	Any prefix of any word matches instantly
Autocomplete suggestions	Trie prefix tree, < 1ms per query
Nested filters	AND / OR / NOT with 10 operators
Highlighted snippets	`<mark>` tags with match context window
Live index updates	add / update / remove in real time, no re-index needed
Domain-agnostic	No schemas, no models, no setup — index anything
TypeScript-first	Full generics, every input/output typed
Zero dependencies	Pure TypeScript, 0 npm dependencies
Browse & list	browse(), getById(), has() — no query string needed
Sort	Sort by any field (string, number, date), in search or browse
Facets / aggregations	terms, range, date_histogram — computed over filtered sets
Multi-index manager	LiteSearchManager with cross-index searchAll, weighted merging
Export / Import	serialize/deserialize + optional file persistence

Quick Start

import { LiteSearch } from "litesearch-engine";

// 1. Define your document type — any shape works
interface BlogPost {
  id: string;
  title: string;
  body: string;
  tags: string[];
  author: string;
}

// 2. Create the engine — no schema, just point at fields
const engine = new LiteSearch<BlogPost>({
  idField: "id",
  fields: {
    title:  { weight: 3, suggest: true },
    body:   { weight: 1 },
    tags:   { weight: 1.5 },
    author: { weight: 2, suggest: true },
  },
});

// 3. Index your data
engine.addMany(posts);

// 4. Search — BM25 scoring, fuzzy matching, all automatic
const result = engine.search("typescript performance");
console.log(result.hits[0].document.title); // best match
console.log(result.took);                    // → 2 (ms)

Installation
Universal Usage
Configuration
Indexing Documents
Searching
Filters
Autocomplete / Suggestions
Highlights
Live Index Updates
Pagination
Stats
Advanced: Custom Tokenizer
Advanced: Custom Scoring
Output Format Reference
Performance Guide
Architecture Deep Dive

Installation

npm install litesearch-engine
# or
yarn add litesearch-engine
# or
pnpm add litesearch-engine

Requirements: Node.js 16+, TypeScript 4.7+ (if using TypeScript).

Universal Usage

litesearch-engine ingests any JSON document shape — no schema, no setup, no config beyond pointing at which fields to index. Every feature below works with any domain: products, users, blog posts, legal cases, recipes, support tickets, logs, you name it.

1. User Directory

import { LiteSearch } from "litesearch-engine";

interface User {
  id: string;
  name: string;
  email: string;
  department: string;
  bio: string;
}

const users = new LiteSearch<User>({
  idField: "id",
  fields: {
    name:       { weight: 3, suggest: true },
    email:      { weight: 2 },
    department: { weight: 1, suggest: true },
    bio:        { weight: 1 },
  },
});

users.addMany([
  { id: "1", name: "Chiamaka Obi",   email: "chiamaka@example.com", department: "Engineering", bio: "Full-stack developer" },
  { id: "2", name: "Kofi Mensah",    email: "kofi@example.com",     department: "Design",      bio: "UX designer" },
  { id: "3", name: "Aisha Bello",    email: "aisha@example.com",    department: "Marketing",   bio: "Content strategist" },
]);

// Full-text search — BM25 scoring, fuzzy, prefix, all automatic
const r = users.search("chiamaka dev");
console.log(r.hits[0].document.name); // "Chiamaka Obi"

// Autocomplete — trie prefix lookup < 1ms
const s = users.suggest("chi");
console.log(s.suggestions[0].text); // "chiamaka"

// Browse all — with filter + sort
const engineering = users.browse({
  filter: { field: "department", operator: "eq", value: "Engineering" },
  sort:   { field: "name", direction: "asc" },
});

// Faceted navigation — department counts
const faceted = users.search("", {
  facets: { department: { type: "terms", size: 5 } },
});
console.log(faceted.facets!.department.buckets);
// → [{ key: "Engineering", count: 1 }, { key: "Design", count: 1 }, ...]

2. Legal Case Database

interface LegalCase {
  caseNumber: string;
  title: string;
  summary: string;
  jurisdiction: string;
  year: number;
}

const cases = new LiteSearch<LegalCase>({
  idField: "caseNumber",         // custom idField
  fields: {
    title:        { weight: 3, suggest: true },
    summary:      { weight: 2 },
    jurisdiction: { weight: 1 },
  },
});

cases.addMany([
  { caseNumber: "SC/1/2024", title: "Maga v. INEC",      summary: "Electoral dispute",        jurisdiction: "Supreme Court", year: 2024 },
  { caseNumber: "CA/45/2023", title: "Bello v. State",    summary: "Criminal appeal",           jurisdiction: "Court of Appeal", year: 2023 },
  { caseNumber: "HC/12/2022", title: "Okafor v. UBA Plc", summary: "Banking and contract law",  jurisdiction: "High Court", year: 2022 },
]);

// Fuzzy finds typos — "electral" matches "Electoral"
const result = cases.search("electral dispuite", { fuzzy: { enabled: true } });

// Filter by jurisdiction + year range
result = cases.search("appeal", {
  filter: {
    AND: [
      { field: "jurisdiction", operator: "eq", value: "Court of Appeal" },
      { field: "year", operator: "gte", value: 2020 },
    ],
  },
});

// Exact ID lookup
const c = cases.getById("SC/1/2024");

// Existence check
if (cases.has("HC/12/2022")) { /* ... */ }

3. Recipe Collection

interface Recipe {
  id: string;
  name: string;
  ingredients: string[];
  cuisine: string;
  prepTime: number; // minutes
  instructions: string;
}

// Array fields are auto-joined: ingredients: ["rice", "beans"] → "rice beans"
const recipes = new LiteSearch<Recipe>({
  fields: {
    name:         { weight: 3, suggest: true },
    ingredients:  { weight: 2 },
    cuisine:      { weight: 1, suggest: true },
    instructions: { weight: 1 },
  },
});

recipes.addMany([
  { id: "1", name: "Jollof Rice",      ingredients: ["rice", "tomatoes", "pepper", "onions"], cuisine: "West African", prepTime: 60, instructions: "..." },
  { id: "2", name: "Egusi Soup",       ingredients: ["egusi", "pumpkin leaves", "palm oil"],   cuisine: "Nigerian",    prepTime: 90, instructions: "..." },
  { id: "3", name: "Yam Porridge",     ingredients: ["yam", "palm oil", "fish"],               cuisine: "Nigerian",    prepTime: 45, instructions: "..." },
  { id: "4", name: "Pad Thai",         ingredients: ["rice noodles", "shrimp", "peanuts"],     cuisine: "Thai",        prepTime: 30, instructions: "..." },
]);

// Search by ingredient — "rice" finds Jollof Rice, Yam Porridge, Pad Thai
const riceDishes = recipes.search("rice");

// Sort by prep time (ascending)
const quickMeals = recipes.search("", { sort: { field: "prepTime", direction: "asc", type: "number" } });

// Facet by cuisine
const byCuisine = recipes.search("", {
  facets: { cuisine: { type: "terms", size: 10 } },
});

// Browse with pagination (20 per page)
const page2 = recipes.browse({ limit: 20, offset: 20 });

4. Job Board

interface Job {
  _id: string;
  title: string;
  skills: string[];
  location: string;
  salaryMin: number;
  salaryMax: number;
}

// Use idResolver for non-standard IDs — here _id is already a string
const jobs = new LiteSearch<Job>({
  idField: "_id",     // direct mapping
  fields: {
    title:    { weight: 3, suggest: true },
    skills:   { weight: 2 },
    location: { weight: 1, suggest: true },
  },
  tokenizer: {
    language: "none",  // keep all tokens — "React" and "react" both exist
    normalizer: (t) => t.toLowerCase(),  // case-insensitive searching
  },
});

jobs.addMany([
  { _id: "1", title: "Senior React Engineer", skills: ["React", "TypeScript", "Node.js"], location: "Lagos", salaryMin: 8000000, salaryMax: 15000000 },
  { _id: "2", title: "UX Designer",           skills: ["Figma", "User Research"],          location: "Remote", salaryMin: 5000000, salaryMax: 10000000 },
  { _id: "3", title: "DevOps Lead",           skills: ["AWS", "Kubernetes", "Terraform"],  location: "Nairobi", salaryMin: 12000000, salaryMax: 20000000 },
]);

// Range filter on salary
const seniorRoles = jobs.search("senior", {
  filter: {
    AND: [
      { field: "salaryMin", operator: "gte", value: 5000000 },
      { field: "salaryMax", operator: "lte", value: 15000000 },
    ],
  },
  sort: { field: "salaryMin", direction: "desc", type: "number" },
});

5. Multi-Index Manager — Cross-Search Users, Articles & Products

import { LiteSearch, LiteSearchManager } from "litesearch-engine";

const manager = new LiteSearchManager();

manager.createIndex("users", {
  fields: { name: { weight: 3, suggest: true }, bio: { weight: 1 } },
});
manager.createIndex("articles", {
  fields: { title: { weight: 3, suggest: true }, body: { weight: 1 } },
});
manager.createIndex("products", {
  fields: { name: { weight: 3, suggest: true }, description: { weight: 1 } },
});

manager.add("users",    { id: "1", name: "Kofi Mensah", bio: "UX designer" });
manager.add("articles", { id: "1", title: "Design Systems", body: "How to build scalable design systems" });
manager.add("products", { id: "1", name: "Wireframe Kit", description: "UI wireframe components for Figma" });

// Single-index search
const userResult = manager.search("users", "kofi");

// Cross-index search — merged, ranked, tagged
const all = manager.searchAll("design", {
  indexes: { users: 1.0, articles: 1.5, products: 1.0 }, // per-index weight
  limit: 20,
});

console.log(all.hits[0].document._index); // "articles" (highest weight × matched)
console.log(all.perIndex);
// → { users: { total: 1, took: 2 }, articles: { total: 1, took: 3 }, products: { total: 1, took: 2 } }

Nested Documents & Custom ID Resolver

Documents with nested objects work via dot-path fields. MongoDB-style _id ObjectIds work via idResolver:

// A document with nested address and an ObjectId-style _id
interface Customer {
  _id: { toString: () => string };
  name: string;
  address: { city: string; state: string };
  tags: Array<{ name: string }>;
}

const customers = new LiteSearch<Customer>({
  idField: "_id",                                       // dot-path NOT needed for top-level
  idResolver: (doc) => (doc as any)._id.toString(),      // extract string from ObjectId
  fields: {
    name:          { weight: 3, suggest: true },
    "address.city":  { weight: 2, path: "address.city" },  // dot-path to nested value
    "address.state": { weight: 1, path: "address.state" },
    tags:          { weight: 2 },                           // arrays of objects are auto-flattened
  },
});

customers.add({
  _id: { toString: () => "cust_001" },
  name: "Amara Okafor",
  address: { city: "Lagos", state: "Lagos" },
  tags: [{ name: "vip" }, { name: "wholesale" }],
});

// All of these find the document:
customers.search("amara");
customers.search("lagos");     // matches address.city + address.state
customers.search("vip");       // matches flattened tags array
customers.getById("cust_001"); // ✓ works because idResolver maps _id → string

Custom Field Extraction

Use FieldConfig.extract to index computed values that don't exist on the raw document:

const engine = new LiteSearch({
  fields: {
    "fullName": { weight: 3, extract: (doc) => `${doc.firstName} ${doc.lastName}` },
  },
});

engine.add({ id: "1", firstName: "Chiamaka", lastName: "Obi" });
engine.search("chiamaka obi"); // ✓ matches from computed "fullName"

Persistence: Save & Restore

import { serialize, deserialize, saveToFile, loadFromFile } from "litesearch-engine";

// Serialize to JSON string
const json = serialize(engine);

// Restore from JSON
const restored = deserialize(json, { fields: { name: { weight: 3 } } });

// Node.js file persistence (browser-safe — throws if fs not available)
await saveToFile(engine, "./search-index.json");
const fromDisk = await loadFromFile("./search-index.json", { fields: { name: { weight: 3 } } });

Configuration

const engine = new LiteSearch<YourDoc>({
  // ── Required ──────────────────────────────────────────────────────────────

  /**
   * The field on your document that uniquely identifies it.
   * Default: "id"
   */
  idField: "id",

  /**
   * Fields to index. Pass as an array (all default config) or an object
   * (per-field control).
   *
   * Short form:
   */
  fields: ["name", "description", "category"],

  // Or long form with per-field config:
  fields: {
    name:        { weight: 3,   suggest: true  },
    description: { weight: 1,   suggest: false },
    category:    { weight: 2,   suggest: true  },
    brand:       { weight: 2.5, suggest: true  },
    tags:        { weight: 1.5, suggest: false },
  },

  // ── Optional ──────────────────────────────────────────────────────────────

  fuzzy: {
    enabled:     true,  // Toggle fuzzy globally
    maxDistance: 2,     // Max Levenshtein edit distance (1 or 2 recommended)
    minLength:   4,     // Minimum query word length before fuzzy activates
  },

  scoring: {
    k1: 1.2,  // BM25 term frequency saturation (1.2 = standard)
    b:  0.75, // BM25 field length normalisation (0.75 = standard)
  },

  suggest: {
    maxResults:    10,   // Max autocomplete suggestions returned
    caseSensitive: false,
  },

  tokenizer: {
    language: "en",  // "en" strips English stopwords, "none" keeps all tokens
  },
});

Field Config Options

Option	Type	Default	Description
`weight`	`number`	`1.0`	Score multiplier. Name matches should outweigh description matches.
`suggest`	`boolean`	`true`	Whether this field feeds the autocomplete trie.
`fuzzy`	`boolean`	`true`	Whether fuzzy matching applies to this field.
`path`	`string`	field name	Dot-path for nested objects: `"meta.brand"` reads `doc.meta.brand`.

Indexing Documents

Single document

engine.add({
  id: "prod_001",
  name: "Nike Air Max 270",
  description: "Lightweight running shoe for men",
  category: "Footwear",
  brand: "Nike",
  price: 45000,
});

Batch (recommended for large datasets)

// Internally still calls add() per document, but in a tight loop.
// For 10,000 docs this typically takes 100–300ms.
engine.addMany(products);

Nested document fields

Works automatically via the path field config:

const engine = new LiteSearch({
  fields: {
    title:       { weight: 3 },
    "meta.brand": { weight: 2, path: "meta.brand" }, // reads doc.meta.brand
  },
});

engine.add({
  id: "1",
  title: "Ankara Dress",
  meta: { brand: "Adire Collective", tags: ["fashion"] },
});

Array fields

Arrays are automatically joined with spaces before tokenizing:

engine.add({
  id: "1",
  tags: ["running", "outdoor", "men"], // indexed as "running outdoor men"
});

Searching

const result = engine.search("running shoes nike", {
  limit:      10,       // Results per page. Default: 10
  offset:     0,        // Pagination offset. Default: 0
  highlight:  true,     // Return <mark> snippets. Default: true
  minScore:   0.1,      // Drop results below this normalised score (0–1)
  boostExact: true,     // Boost exact phrase matches to top. Default: true
  fields:     ["name", "brand"], // Search only these fields (subset)
  filter: {             // Optional filter (see Filters section)
    AND: [
      { field: "category", operator: "eq", value: "Footwear" },
      { field: "price", operator: "lte", value: 60000 },
    ]
  },
});

Result shape

{
  hits: [
    {
      document:  { id: "1", name: "Nike Air Max 270", ... }, // original doc
      score:     0.97,       // normalised relevance (0–1)
      rawScore:  4.82,       // raw BM25 score
      matchType: "exact",    // "exact" | "prefix" | "fuzzy"
      highlights: [
        {
          field:         "name",
          snippet:       "…<mark>Nike</mark> <mark>Air</mark> Max 270…",
          matchedTokens: ["nike", "air"],
        },
        {
          field:         "description",
          snippet:       "Lightweight <mark>running</mark> <mark>shoes</mark> for men",
          matchedTokens: ["running", "shoes"],
        }
      ]
    },
    // ...more hits
  ],
  total:      47,       // total matching docs (before pagination)
  took:       3,        // milliseconds
  query:      "running shoes nike",
  pagination: {
    limit:   10,
    offset:  0,
    hasMore: true,
  }
}

Filters

Filters can be simple clauses or deeply nested AND/OR/NOT groups.

Simple clause

engine.search("phone", {
  filter: { field: "category", operator: "eq", value: "Electronics" }
});

Available operators

Operator	Description	Example value
`eq`	Equals	`"Electronics"`
`neq`	Not equals	`"Draft"`
`gt`	Greater than	`50000`
`gte`	Greater than or equal	`50000`
`lt`	Less than	`100000`
`lte`	Less than or equal	`100000`
`range`	Between (inclusive)	`[10000, 50000]`
`in`	Value is in list	`["Nike", "Adidas"]`
`nin`	Value is NOT in list	`["Draft", "Archived"]`
`contains`	String contains (case-insensitive)	`"max"`
`startsWith`	String starts with (case-insensitive)	`"Nike"`
`exists`	Field is not null/undefined	(no value needed, pass `true`)

Compound filters

// Products in Electronics, priced ₦100k–₦500k, not out-of-stock
engine.search("laptop", {
  filter: {
    AND: [
      { field: "category",  operator: "eq",    value: "Electronics" },
      { field: "price",     operator: "range", value: [100000, 500000] },
      { field: "inStock",   operator: "eq",    value: true },
    ]
  }
});

// Either Nike or Adidas, under ₦50k
engine.search("shoes", {
  filter: {
    AND: [
      {
        OR: [
          { field: "brand", operator: "eq", value: "Nike" },
          { field: "brand", operator: "eq", value: "Adidas" },
        ]
      },
      { field: "price", operator: "lt", value: 50000 },
    ]
  }
});

// Anything BUT the "Food" category
engine.search("noodles", {
  filter: {
    NOT: { field: "category", operator: "eq", value: "Food" }
  }
});

Nested field filters

Use dot-path notation — works the same as field indexing:

{ field: "meta.brand", operator: "eq", value: "Nike" }

Autocomplete / Suggestions

const result = engine.suggest("nikee"); // typo
// {
//   suggestions: [
//     { text: "nike",    documentIds: ["1","9"], frequency: 2, matchType: "fuzzy",  distance: 1 },
//     { text: "nikelab", documentIds: ["3"],     frequency: 1, matchType: "fuzzy",  distance: 2 },
//   ],
//   took: 1,
//   query: "nikee"
// }

// Perfect prefix match
engine.suggest("run");
// → ["running", "runway", ...] matched by trie prefix in < 1ms

How it works

Trie prefix lookup — O(prefix length), always checked first.
Fuzzy fallback — Only if prefix returns < 3 results. Scans all trie words with Levenshtein distance ≤ adaptive threshold.
Ranking — exact > prefix > fuzzy, then by frequency (how many docs contain the term).

Suggestion result shape

{
  text:        "running",
  documentIds: ["1", "2", "15"], // which docs contain this word
  frequency:   3,                // how many times indexed
  matchType:   "prefix",         // "exact" | "prefix" | "fuzzy"
  distance:    0,                // Levenshtein distance from query
}

Highlights

Highlights are returned by default on every search. Disable them for performance-critical paths where you don't need them:

engine.search("nike shoes", { highlight: false });

The snippet:

Finds the position of the first match in the field value
Returns a ±30 character context window (max 160 chars)
Wraps matched tokens in <mark>…</mark>
Adds … ellipsis when the value is truncated

// Input:  "Lightweight running shoes designed for men with narrow feet"
// Query:  "running shoes"
// Output: "Lightweight <mark>running</mark> <mark>shoes</mark> designed for men…"

You can render highlights directly in HTML, or strip the <mark> tags for plain text:

const plain = hit.highlights[0].snippet.replace(/<\/?mark>/g, "");
// → "Lightweight running shoes designed for men…"

Live Index Updates

The index updates instantly — no rebuild required.

// Add a new document → immediately searchable
engine.add({ id: "99", name: "New Arrival", ... });

// Update an existing document (same ID = upsert)
engine.update({ id: "99", name: "New Arrival - Updated", ... });

// Remove a document
engine.remove("99");

// Wipe the entire index
engine.clear();

Integrating with your database

// With Mongoose / MongoDB
Product.watch().on("change", (change) => {
  if (change.operationType === "insert")  engine.add(change.fullDocument);
  if (change.operationType === "update")  engine.update(change.fullDocument);
  if (change.operationType === "delete")  engine.remove(change.documentKey._id.toString());
});

// With Prisma / PostgreSQL
// After any product save:
await prisma.product.update({ ... });
engine.update(updatedProduct);

// After delete:
await prisma.product.delete({ where: { id } });
engine.remove(id);

Seeding on startup

// server.ts
const products = await Product.find({}).lean(); // or prisma.product.findMany()
engine.addMany(products);
console.log(`Search index ready: ${engine.stats().documentCount} documents`);

Pagination

// Page 1
const page1 = engine.search("phone", { limit: 10, offset: 0 });

// Page 2
const page2 = engine.search("phone", { limit: 10, offset: 10 });

// Check if more pages exist
if (page1.pagination.hasMore) {
  // fetch next page
}

// Total results (for "Showing X of Y results")
console.log(`Showing ${page1.hits.length} of ${page1.total} results`);

Stats

const stats = engine.stats();
// {
//   documentCount:       10000,
//   termCount:           84320,  // unique indexed terms
//   trieNodeCount:       62100,  // autocomplete trie size
//   fields:              ["name", "description", "category", "brand"],
//   memoryEstimateBytes: 15728640, // ~15MB for 10k products
//   lastUpdated:         2024-01-15T10:30:00.000Z,
// }

Advanced: Custom Tokenizer

The default tokenizer: lowercases, splits on non-alphanumeric characters, strips English stopwords (a, the, and, etc.), and drops tokens < 2 characters.

Override it completely:

const engine = new LiteSearch({
  tokenizer: {
    tokenize: (text: string): string[] => {
      // Your own logic — split on hyphens too, for example
      return text
        .toLowerCase()
        .split(/[\s\-_,\.]+/)
        .filter(t => t.length >= 2);
    }
  }
});

Or just change the language setting to keep all tokens (no stopword removal):

tokenizer: { language: "none" }

Advanced: Custom Scoring

Tune BM25 parameters:

Parameter	Effect	When to change
`k1 = 1.2` (default)	Controls term-frequency saturation. Higher = longer documents score higher.	Increase for long descriptions, decrease for short product names.
`b = 0.75` (default)	Field-length normalisation. `b=1` fully normalises, `b=0` ignores length.	Decrease if your products have wildly different description lengths.

// Tuned for short product names (less length normalisation)
scoring: { k1: 1.5, b: 0.3 }

// Tuned for long blog posts
scoring: { k1: 1.2, b: 0.9 }

Output Format Reference

`SearchResult<T>`

interface SearchResult<T> {
  hits:       SearchHit<T>[];
  total:      number;          // total matches (pre-pagination)
  took:       number;          // ms
  query:      string;
  pagination: {
    limit:   number;
    offset:  number;
    hasMore: boolean;
  };
}

`SearchHit<T>`

interface SearchHit<T> {
  document:   T;               // your original document, untouched
  score:      number;          // 0–1 normalised relevance
  rawScore:   number;          // raw BM25 score (for debugging)
  matchType:  "exact" | "prefix" | "fuzzy";
  highlights?: HighlightResult[];
}

`HighlightResult`

interface HighlightResult {
  field:         string;     // which field matched
  snippet:       string;     // context window with <mark> tags
  matchedTokens: string[];   // which tokens matched
}

`SuggestResult`

interface SuggestResult {
  suggestions: SuggestionHit[];
  took:        number;
  query:       string;
}

interface SuggestionHit {
  text:        string;     // the suggested word
  documentIds: string[];   // which docs contain it
  frequency:   number;     // how many docs (for ranking)
  matchType:   "exact" | "prefix" | "fuzzy";
  distance:    number;     // Levenshtein distance from query
}

Performance Guide

Expected benchmarks

Documents	Index time	Search (no filter)	Search (with filter)	Suggest
1,000	~10ms	< 2ms	< 3ms	< 1ms
10,000	~80ms	< 10ms	< 15ms	< 2ms
50,000	~400ms	< 50ms	< 80ms	< 10ms

Tips

1. Only index what you search. Avoid indexing fields you never query. Every extra field increases index time and memory.

// ❌ Don't do this
fields: ["id", "createdAt", "updatedAt", "internalNotes", "name"]

// ✅ Only searchable fields
fields: { name: { weight: 3 }, description: { weight: 1 } }

2. Disable suggest on heavy fields. The trie only needs to index fields your autocomplete uses.

fields: {
  name:        { suggest: true  }, // ✅ autocomplete from names
  description: { suggest: false }, // ❌ skip — too many tokens
}

3. Disable highlighting on search-as-you-type routes. You don't need highlights on every keystroke.

// On keypress
engine.suggest(query); // use suggest(), not search()

// On submit / full search
engine.search(query, { highlight: true });

4. Use minScore to cut noise.

engine.search("laptop bag", { minScore: 0.2 }); // drop weak matches

5. Pre-filter with filters, not post-filter. Filters in litesearch are applied after BM25 scoring but before result assembly. For range/category filters on large datasets, consider structuring your index to use pre-filtered collections if you're approaching 50k+ documents.

6. Seed index at server startup. Don't re-create the engine per request. Create it once and keep it in module scope.

// search.service.ts — module-level singleton
import { LiteSearch } from "litesearch-engine";
export const searchEngine = new LiteSearch({ ... });

Architecture Deep Dive

For contributors and advanced users.

Inverted Index

The core of the engine. Structure:

field → term → Map<docId, positions[]>

Example:
"name" → {
  "nike":    { "doc1": [0], "doc9": [0] },
  "air":     { "doc1": [1] },
  "max":     { "doc1": [2] },
  "running": { "doc1": [0], "doc2": [0], "doc15": [3] },
}

Each token stores its positions (not just presence). Positions enable future phrase-proximity scoring and exact phrase detection.

BM25+ Scoring

For each query term across each field:

score(term, doc, field) =
  IDF(term) × TF_norm(term, doc, field) × field_weight

IDF(term) = log((N - df + 0.5) / (df + 0.5) + 1)
  N  = total documents
  df = documents containing this term

TF_norm = (tf × (k1 + 1)) / (tf + k1 × (1 - b + b × (fieldLen / avgFieldLen)))
  tf       = how many times term appears in this doc's field
  fieldLen = token count of this field in this doc
  avgFieldLen = average across all docs

Final doc score = sum of all term+field scores → normalised to [0, 1].

Levenshtein (Fuzzy)

Uses a two-row dynamic programming table (O(m×n) time, O(n) space) with:

Early exit when the minimum possible distance in a row exceeds maxDistance
Length pre-filter — skips terms where |len_a - len_b| > maxDistance without running DP
Adaptive threshold — terms < 4 chars require exact match; 4–6 chars allow distance 1; 7+ allow distance 2

Trie (Autocomplete)

A character-level prefix tree where each node stores:

children: Map<char, TrieNode>
docIds: Set<string> — all docs reachable from this prefix
frequency: number — for ranking

Prefix lookup is O(prefix length). After reaching the prefix node, a DFS collects all descendant words, sorted by frequency.

Query Pipeline

query string
    ↓ tokenize (lowercase, split, strip stopwords)
    ↓ per-token lookup: exact → prefix → fuzzy
    ↓ BM25 score accumulation per field
    ↓ exact phrase boost (if multi-token)
    ↓ normalise scores to [0, 1]
    ↓ filter (AND/OR/NOT evaluation)
    ↓ sort DESC by score
    ↓ paginate
    ↓ build highlights
    ↓ SearchResult

License

MIT

Keywords

search fuzzy elasticsearch lightweight typescript in-memory