0.3.0 • Published 3d agoCLI

@jsilvanus/seedeer

Licence

MIT

Version

0.3.0

Deps

Size

624 kB

Vulns

Weekly

Summary Dependency Versions

seedeer

seedeer Logo: a deer with vector numbers between antlers. Logo generated by ChatGPT. Public Domain.

A Node.js vision-model toolkit — the vision counterpart to @jsilvanus/embedeer (text embeddings) and @jsilvanus/chattydeer (LLM chat). Import it directly with no HTTP server in the loop; local or remote model execution is a configuration detail, not an API difference.

Status: all roadmap phases complete. Image embeddings (JointEmbedder, VisualEmbedder), captioning (Captioner), VQA (VqaAssistant, local and remote backends), and detect/track/zone-trigger (Detector, Tracker, ZoneTrigger, TrackingSession) are all real and working, across all four execution modes (process/thread/socket/grpc) where applicable, with per-pillar benchmark scripts (npm run bench) and multi-server load balancing for the network modes. See docs/ROADMAP.md for what's coming and in what order, and docs/ARCHITECTURE.md for the design principles behind it.

Planned functionality

Pillar	What it does	Doc
Image embeddings	Joint image-text (CLIP-class) and vision-only (DINOv3-class) embeddings	docs/features/embeddings.md
Captioning	Cheap, fast, generic image descriptions	docs/features/captioning.md
Visual Question Answering	Question-driven answers about an image, local or delegated to a remote OpenAI-compatible vision endpoint	docs/features/vqa.md
Detect + Track + Zone-trigger	Real-time person detection, cross-frame tracking, and named-zone enter/exit events for production-assistant use cases	docs/features/detection-tracking.md

OCR and image redaction/pixel mutation are explicit non-goals — see docs/ARCHITECTURE.md.

Design principles (short version)

No HTTP in the consuming product — import and call a method.
Every pillar supports local-or-remote execution (process / thread / socket / grpc), including the real-time detection/tracking path.
Where model strength matters more than latency (VQA), the backend is pluggable: a small local model by default, or delegation to a configured remote endpoint — the call shape never changes.
Reuses embedeer's proven worker-pool/provider-loader/server patterns rather than reinventing them for vision inputs.
No tight coupling to any specific downstream consumer.

Full rationale in docs/ARCHITECTURE.md.

Installation

npm install @jsilvanus/seedeer

License

MIT

Keywords

vision computer-vision object-detection object-tracking vqa visual-question-answering image-captioning image-embeddings clip large-language-models nodejs huggingface-transformers