@benkhz/context-manager
A vanilla JS class that manages LLM conversation context end-to-end. Zero runtime dependencies.
Installation
npm install @benkhz/context-managerQuick start
import { AIContextManager, openaiPreset } from '@benkhz/context-manager'
const mgr = new AIContextManager({
endpoint: 'https://api.openai.com/v1/chat/completions',
model: 'gpt-4o',
headers: { Authorization: `Bearer ${process.env.OPENAI_KEY}` },
hooks: {
...openaiPreset,
},
})
const reply = await mgr.send('What is the capital of France?')
console.log(reply.content) // "The capital of France is Paris."Design decisions
| Concern | Decision | Rationale |
|---|---|---|
| Provider agnosticism | Caller supplies formatRequest + parseResponse hooks |
No lock-in; presets ship for OpenAI and Anthropic as reference |
| Context compaction | Manager POSTs to the same endpoint with a summarise prompt | Automatic; onCompact hook overrides for custom logic |
| Reactive state | setState / getState / subscribe — callbacks fire synchronously |
Familiar, zero magic, easy to test |
| Tool loop cap | 10 iterations max | Guards against infinite loops without blocking legitimate multi-step reasoning |
| Context sizing | Character count approximation | Token counting requires a tokenizer dep; chars are close enough for limit-triggering |
Constructor
new AIContextManager(config)| Option | Type | Default | Description |
|---|---|---|---|
endpoint |
string |
required | URL to POST every LLM request to |
hooks |
object |
required | See Hooks |
model |
string |
— | Passed through to formatRequest |
maxTokens |
number |
— | Passed through to formatRequest |
contextLimit |
number |
80_000 |
Char count that triggers auto-compaction |
compactKeepLast |
number |
6 |
Messages preserved verbatim after compaction |
injectSummary |
boolean |
true |
Auto-prepend the latest summary as a system message on every LLM request. Set false to place it yourself via context.summary in formatRequest. |
headers |
object |
{} |
Extra HTTP headers on every fetch call |
Hooks
All hooks live in config.hooks. Only formatRequest and parseResponse are required.
Required
formatRequest(context, config) → RequestBody
Converts the internal context snapshot into the HTTP request body.
// context shape
{
messages: [{ role, content, toolCalls?, toolCallId? }],
tools: [{ name, schema: { description, parameters } }],
summary: string | null,
}
// config shape (your constructor options + per-call system override)
{ endpoint, model, maxTokens, headers, system? }parseResponse(rawJson) → ParsedResponse
Converts the raw HTTP response JSON into the internal shape.
// must return
{
content: string, // assistant text
stopReason: string, // e.g. 'stop', 'tool_use'
toolCalls?: [{ id, name, args }], // present when model calls tools
}Optional lifecycle hooks
| Hook | Signature | Description |
|---|---|---|
beforeSend |
(messages[]) → messages[] |
Transform or filter the message array before each POST. Return the array. |
afterReceive |
(parsed) → parsed |
Transform the parsed response before tool/message processing. |
onCompact |
(overflow[], currentSummary) → string | void |
Return a summary string to override the LLM-generated one. |
onToolCall |
(name, args) → args | void |
Intercept before a tool runs. Return new args to override. |
onToolResult |
(name, result) → result |
Transform a tool result before appending to context. |
onError |
(error, phase) → void |
Called on any error. phase is 'send', 'compact', or 'tool'. |
onStateChange |
(key, oldVal, newVal) → void |
Observe every state mutation globally. |
onContextLimit |
(charCount, limit) → 'compact' | 'truncate' | 'error' |
Choose what happens when the context limit is hit. Defaults to 'compact'. |
Messaging API
const reply = await mgr.send('your message')
// → { role: 'assistant', content: string }
await mgr.send('follow-up', { system: 'You are a pirate.' })
await mgr.compact()Tool API
mgr.addTool(
'getWeather',
{
description: 'Get current weather for a city',
parameters: {
type: 'object',
properties: { city: { type: 'string', description: 'City name' } },
required: ['city'],
},
},
async ({ city }) => ({ temp: 72, unit: 'F', condition: 'sunny' })
)
mgr.removeTool('getWeather')
mgr.getTools() // → [{ name, schema }]Event bus
mgr.on('message:sent', ({ message }) => ...)
mgr.on('message:received', ({ message }) => ...)
mgr.on('tool:call', ({ name, args }) => ...)
mgr.on('tool:result', ({ name, result }) => ...)
mgr.on('context:compact', ({ messageCount }) => ...)
mgr.on('context:compacted',({ summary }) => ...)
mgr.on('state:change', ({ key, oldValue, newValue }) => ...)
mgr.on('error', ({ error, phase }) => ...)
mgr.off('message:received', handler)
mgr.once('message:received', handler)Reactive state
mgr.setState('userId', 'u_123')
mgr.getState('userId') // → 'u_123'
const unsub = mgr.subscribe('userId', (newVal, oldVal) => {
console.log(`userId changed: ${oldVal} → ${newVal}`)
})
unsub()Introspection
mgr.getMessages() // → Message[] — full, never-pruned turn history
mgr.getActiveMessages() // → Message[] — current LLM-facing window (post-compaction)
mgr.getSummary() // → string | null — latest summary
mgr.getSummaries() // → string[] — every summary ever produced, oldest first
mgr.getTools() // → [{ name, schema }]
mgr.getContext() // → { messages, activeMessages, summary, summaries, tools }
mgr.reset() // clear all message/summary state — returns thisgetMessages() always returns every turn ever sent or received, even after compaction has shrunk the
LLM-facing window — useful for rendering a full conversation transcript in a UI. getActiveMessages()
returns what's actually being sent to the model right now.
Presets
import { openaiPreset, anthropicPreset } from '@benkhz/context-manager'
// OpenAI / Azure / Ollama / LM Studio
const mgr = new AIContextManager({
hooks: {
...openaiPreset,
beforeSend: msgs => msgs.filter(m => m.content),
},
})
// Anthropic Messages API
const mgr = new AIContextManager({
hooks: { ...anthropicPreset },
})Context compaction
The manager tracks two parallel message lists: the full history (everything ever sent or
received, exposed via getMessages()) and the active window (getActiveMessages()) — the
slice actually sent to the LLM, which compaction and truncation shrink. History is never pruned.
This check runs at the start of every send() call, and also between tool-call iterations
within a single turn — a request that triggers several tool calls in a row can grow the active
window past contextLimit well before the turn finishes, so compaction can kick in mid-turn
rather than waiting for the next send().
When the character count of the active window exceeds contextLimit:
onContextLimithook is called — returns'compact'(default),'truncate', or'error'- If
compact: the overflow messages are sent to the LLM with a summarise prompt - The summary is stored (and appended to the summary history); the last
compactKeepLastactive messages are kept verbatim - On the next request, the latest summary is auto-prepended as a
systemmessage ahead of the active window — unlessinjectSummary: false, in which case you place it yourself viacontext.summaryinformatRequest
The onCompact hook can return a string to bypass the LLM call entirely.
Both compaction and truncation snap their cut point to avoid splitting a tool-call/tool-result
pair across the boundary — the kept window never starts with an orphaned tool message.
Open decisions
- Token counting — currently approximated via character count. A future
tokenizeroption could accept a(messages) => numberfn for more accurate limiting. - Streaming —
send()is request/response only. - Persistence —
getContext()returns a serialisable snapshot. A futureloadContext(snapshot)method would complete the persistence story. - Multi-modal — content is currently assumed to be
string.