npm.io
2.0.2 • Published 3d ago

@benkhz/context-manager

Licence
MIT
Version
2.0.2
Deps
0
Size
29 kB
Vulns
0
Weekly
149

@benkhz/context-manager

A vanilla JS class that manages LLM conversation context end-to-end. Zero runtime dependencies.

Installation

npm install @benkhz/context-manager

Quick start

import { AIContextManager, openaiPreset } from '@benkhz/context-manager'

const mgr = new AIContextManager({
  endpoint: 'https://api.openai.com/v1/chat/completions',
  model:    'gpt-4o',
  headers:  { Authorization: `Bearer ${process.env.OPENAI_KEY}` },
  hooks: {
    ...openaiPreset,
  },
})

const reply = await mgr.send('What is the capital of France?')
console.log(reply.content) // "The capital of France is Paris."

Design decisions

Concern Decision Rationale
Provider agnosticism Caller supplies formatRequest + parseResponse hooks No lock-in; presets ship for OpenAI and Anthropic as reference
Context compaction Manager POSTs to the same endpoint with a summarise prompt Automatic; onCompact hook overrides for custom logic
Reactive state setState / getState / subscribe — callbacks fire synchronously Familiar, zero magic, easy to test
Tool loop cap 10 iterations max Guards against infinite loops without blocking legitimate multi-step reasoning
Context sizing Character count approximation Token counting requires a tokenizer dep; chars are close enough for limit-triggering

Constructor

new AIContextManager(config)
Option Type Default Description
endpoint string required URL to POST every LLM request to
hooks object required See Hooks
model string Passed through to formatRequest
maxTokens number Passed through to formatRequest
contextLimit number 80_000 Char count that triggers auto-compaction
compactKeepLast number 6 Messages preserved verbatim after compaction
injectSummary boolean true Auto-prepend the latest summary as a system message on every LLM request. Set false to place it yourself via context.summary in formatRequest.
headers object {} Extra HTTP headers on every fetch call

Hooks

All hooks live in config.hooks. Only formatRequest and parseResponse are required.

Required
formatRequest(context, config) → RequestBody

Converts the internal context snapshot into the HTTP request body.

// context shape
{
  messages: [{ role, content, toolCalls?, toolCallId? }],
  tools:    [{ name, schema: { description, parameters } }],
  summary:  string | null,
}

// config shape (your constructor options + per-call system override)
{ endpoint, model, maxTokens, headers, system? }
parseResponse(rawJson) → ParsedResponse

Converts the raw HTTP response JSON into the internal shape.

// must return
{
  content:    string,                              // assistant text
  stopReason: string,                              // e.g. 'stop', 'tool_use'
  toolCalls?: [{ id, name, args }],               // present when model calls tools
}
Optional lifecycle hooks
Hook Signature Description
beforeSend (messages[]) → messages[] Transform or filter the message array before each POST. Return the array.
afterReceive (parsed) → parsed Transform the parsed response before tool/message processing.
onCompact (overflow[], currentSummary) → string | void Return a summary string to override the LLM-generated one.
onToolCall (name, args) → args | void Intercept before a tool runs. Return new args to override.
onToolResult (name, result) → result Transform a tool result before appending to context.
onError (error, phase) → void Called on any error. phase is 'send', 'compact', or 'tool'.
onStateChange (key, oldVal, newVal) → void Observe every state mutation globally.
onContextLimit (charCount, limit) → 'compact' | 'truncate' | 'error' Choose what happens when the context limit is hit. Defaults to 'compact'.

Messaging API

const reply = await mgr.send('your message')
// → { role: 'assistant', content: string }

await mgr.send('follow-up', { system: 'You are a pirate.' })

await mgr.compact()

Tool API

mgr.addTool(
  'getWeather',
  {
    description: 'Get current weather for a city',
    parameters: {
      type: 'object',
      properties: { city: { type: 'string', description: 'City name' } },
      required: ['city'],
    },
  },
  async ({ city }) => ({ temp: 72, unit: 'F', condition: 'sunny' })
)

mgr.removeTool('getWeather')
mgr.getTools()   // → [{ name, schema }]

Event bus

mgr.on('message:sent',     ({ message }) => ...)
mgr.on('message:received', ({ message }) => ...)
mgr.on('tool:call',        ({ name, args }) => ...)
mgr.on('tool:result',      ({ name, result }) => ...)
mgr.on('context:compact',  ({ messageCount }) => ...)
mgr.on('context:compacted',({ summary }) => ...)
mgr.on('state:change',     ({ key, oldValue, newValue }) => ...)
mgr.on('error',            ({ error, phase }) => ...)

mgr.off('message:received', handler)
mgr.once('message:received', handler)

Reactive state

mgr.setState('userId', 'u_123')
mgr.getState('userId')              //'u_123'

const unsub = mgr.subscribe('userId', (newVal, oldVal) => {
  console.log(`userId changed: ${oldVal}${newVal}`)
})
unsub()

Introspection

mgr.getMessages()        // → Message[] — full, never-pruned turn history
mgr.getActiveMessages()  // → Message[] — current LLM-facing window (post-compaction)
mgr.getSummary()         // → string | null — latest summary
mgr.getSummaries()       // → string[] — every summary ever produced, oldest first
mgr.getTools()           // → [{ name, schema }]
mgr.getContext()         // → { messages, activeMessages, summary, summaries, tools }
mgr.reset()              // clear all message/summary state — returns this

getMessages() always returns every turn ever sent or received, even after compaction has shrunk the LLM-facing window — useful for rendering a full conversation transcript in a UI. getActiveMessages() returns what's actually being sent to the model right now.


Presets

import { openaiPreset, anthropicPreset } from '@benkhz/context-manager'

// OpenAI / Azure / Ollama / LM Studio
const mgr = new AIContextManager({
  hooks: {
    ...openaiPreset,
    beforeSend: msgs => msgs.filter(m => m.content),
  },
})

// Anthropic Messages API
const mgr = new AIContextManager({
  hooks: { ...anthropicPreset },
})

Context compaction

The manager tracks two parallel message lists: the full history (everything ever sent or received, exposed via getMessages()) and the active window (getActiveMessages()) — the slice actually sent to the LLM, which compaction and truncation shrink. History is never pruned.

This check runs at the start of every send() call, and also between tool-call iterations within a single turn — a request that triggers several tool calls in a row can grow the active window past contextLimit well before the turn finishes, so compaction can kick in mid-turn rather than waiting for the next send().

When the character count of the active window exceeds contextLimit:

  1. onContextLimit hook is called — returns 'compact' (default), 'truncate', or 'error'
  2. If compact: the overflow messages are sent to the LLM with a summarise prompt
  3. The summary is stored (and appended to the summary history); the last compactKeepLast active messages are kept verbatim
  4. On the next request, the latest summary is auto-prepended as a system message ahead of the active window — unless injectSummary: false, in which case you place it yourself via context.summary in formatRequest

The onCompact hook can return a string to bypass the LLM call entirely.

Both compaction and truncation snap their cut point to avoid splitting a tool-call/tool-result pair across the boundary — the kept window never starts with an orphaned tool message.


Open decisions

  • Token counting — currently approximated via character count. A future tokenizer option could accept a (messages) => number fn for more accurate limiting.
  • Streamingsend() is request/response only.
  • PersistencegetContext() returns a serialisable snapshot. A future loadContext(snapshot) method would complete the persistence story.
  • Multi-modal — content is currently assumed to be string.

Keywords