Keyword: multimodal

@mux/ai
Released
5d ago
Version
0.25.0
AI library for Mux
mux video ai llm openai anthropic +12
mixpeek
Released
4d ago
Version
0.81.20
Official Mixpeek TypeScript/JavaScript SDK for multimodal data processing and retrieval
mixpeek multimodal search retrieval vector ai +3
trvl-mcp
Released
14h ago
Version
1.18.0
Travel MCP server + CLI for flights, hotels, trains, cars, ferries, and door-to-door multimodal trips — no API keys required, single Go binary, works with any MCP client (Claude, Cursor, Windsurf, Codex). 1 smart travel MCP tool, natural language; legacy
mcp mcp-server travel flights flight-search hotels +14
@hmanlab/hl-plugins
Released
2d ago
Version
0.5.3
One-command installer for curated OpenCode plugins. Adds image, video, music, and speech generation to your coding agent without leaving the chat. MIT, no telemetry.
opencode opencode-plugin plugin cli installer multimodal +7
@neta-art/generation
Released
2h ago
Version
0.1.10
A lightweight multimodal generation SDK with built-in model presets and adapter-based provider calls.
ai generation generative-ai multimodal image-generation video-generation +2
acptoapi
Released
16h ago
Version
1.0.125
Anthropic SDK to multi-provider streaming bridge - converts Anthropic message format and tool calls to Gemini, OpenAI-compatible APIs
anthropic gemini google ai streaming proxy +8
@sogni-ai/sogni-client
Released
yesterday
Version
5.0.0
Sogni SDK - AI image, video & audio generation plus LLM chat with vision via the Sogni Supernet (Stable Diffusion, Flux, WAN 2.2, LTX-2, Seedance, Qwen VLM)
ai image-generation video-generation stable-diffusion flux wan +14
@fre4x/gemini
Released
yesterday
Version
1.1.6
A Gemini MCP server providing multimodal analysis and image/video generation.
mcp gemini google-ai mcp-server multimodal imagen +1
@space3-npm/cybersoul-client
Released
5d ago
Version
1.4.29
Cyber-Soul multimodal character interaction SDK by Space3 Digital Media Tech Studio
cybersoul sdk ai multimodal
opencode-multimodal
Released
4d ago
Version
0.1.6
Give every opencode model multimodal capabilities by routing attachments to a fallback multimodal model. Configure everything via the /multimodal command.
opencode opencode-plugin multimodal vision pdf audio +1
otto-agent-cli
Released
3d ago
Version
3.260626.2
The first multimodal, multi-model AI coding agent for your terminal — pair Claude + Codex on every turn and take the same session to WhatsApp or Telegram. Local-first and open source.
ai ai-agent coding-agent cli ai-cli multi-model +14
@mixpeek/n8n-nodes-mixpeek
Released
4d ago
Version
1.0.27
n8n community node for Mixpeek - multimodal data processing and semantic search API
n8n-community-node-package mixpeek multimodal semantic-search ai embeddings +2
@vedmalex/ai-connect
Released
yesterday
Version
0.7.0
Bun-first TypeScript library for unified AI provider access across browser and local runtimes.
ai openai anthropic gemini acp bun +2
pi-vision-tool
Released
6d ago
Version
1.3.7
Pi Agent extension that adds a describe_image tool, letting non-multimodal models delegate image analysis to a vision-capable model (like Qwen VL)
pi-package pi-extension vision multimodal image-analysis
@joezm/seed-viz
Released
yesterday
Version
0.3.0
Vision analysis CLI + MCP server backed by Seed 2.0 via Volcano Ark or any OpenAI-compatible endpoint
seed seed-2 vision multimodal cli mcp +3
n8n-nodes-openrouter-clean
Released
yesterday
Version
1.0.3
n8n community nodes for OpenRouter — full API support including reasoning control, multimodal content (images, video URLs), provider preferences, plugins (web search, web fetch), structured outputs, TTS, STT, and model listing.
n8n-community-node-package openrouter llm ai chat multimodal +1
n8n-nodes-siliconflow-ai
Released
1h ago
Version
0.6.0
n8n community node for SiliconFlow (硅基流动). Zero runtime dependencies. Provides a SiliconFlow action node (Chat / Vision / Embeddings / Image / Rerank / Audio TTS+ASR / Video) and a LangChain-compatible Chat Model node for AI Agents. Installs cleanly witho
n8n n8n-community-node-package siliconflow silicon-flow 硅基流动 ai +14
@daguito/sdk
Released
9h ago
Version
0.3.11
Official TypeScript SDK for the Daguito conversational AI platform — text, voice, image, and multimodal flows over webhooks and the embeddable widget.
daguito ai agent chatbot voice websocket +3
visual-explanation-engine
Released
yesterday
Version
0.1.1
A Codex skill that turns explanations into visual, interactive, multimodal learning experiences.
codex-skill visual-explanation interactive-learning diagrams education instructional-design +2
@miragari/ai-media-router
Released
3d ago
Version
0.2.1
Unified AI media generation SDK with built-in providers and one API for image, video, audio, and 3D workflows.
ai-media-router media-router sdk ai multimodal multi-provider +3
@llm-ports/adapter-google
Released
4d ago
Version
0.1.0-alpha.24
Google Gemini adapter for llm-ports — native @google/genai SDK integration with bundled pricing, content-block translation, validation repair, and image-size + URL validation.
llm ai typescript google gemini google-genai +5
@mastra/voyageai
Released
2d ago
Version
0.3.0
VoyageAI embeddings integration for Mastra - text, multimodal, and contextualized chunk embeddings
mastra voyage voyageai embeddings multimodal ai +2
openaijsonwrapper
Released
5d ago
Version
0.2.1
JSON wrapper for OpenAI-style multi-modal API with structured output
openai json wrapper multimodal vision
@smoose/pi-vision
Released
5d ago
Version
0.1.2
Pi image recognition tool: lets text-only models see images via local CLI providers (Codex or Agy).
pi-package pi-extension pi codex agy image-recognition +3
@ajmalaksar/multimodal-mcp
Released
4d ago
Version
0.1.5
MCP server and CLI exposing Google's multimodal models (Gemini AI Studio and Vertex AI) for image, video, and audio analysis plus image generation — for Claude Code and other agents.
mcp gemini vertex nano-banana multimodal image-generation +1