Travel MCP server + CLI for flights, hotels, trains, cars, ferries, and door-to-door multimodal trips — no API keys required, single Go binary, works with any MCP client (Claude, Cursor, Windsurf, Codex). 1 smart travel MCP tool, natural language; legacy
One-command installer for curated OpenCode plugins. Adds image, video, music, and speech generation to your coding agent without leaving the chat. MIT, no telemetry.
A lightweight multimodal generation SDK with built-in model presets and adapter-based provider calls.
Sogni SDK - AI image, video & audio generation plus LLM chat with vision via the Sogni Supernet (Stable Diffusion, Flux, WAN 2.2, LTX-2, Seedance, Qwen VLM)
A Gemini MCP server providing multimodal analysis and image/video generation.
Cyber-Soul multimodal character interaction SDK by Space3 Digital Media Tech Studio
Give every opencode model multimodal capabilities by routing attachments to a fallback multimodal model. Configure everything via the /multimodal command.
The first multimodal, multi-model AI coding agent for your terminal — pair Claude + Codex on every turn and take the same session to WhatsApp or Telegram. Local-first and open source.
n8n community node for Mixpeek - multimodal data processing and semantic search API
Pi Agent extension that adds a describe_image tool, letting non-multimodal models delegate image analysis to a vision-capable model (like Qwen VL)
Vision analysis CLI + MCP server backed by Seed 2.0 via Volcano Ark or any OpenAI-compatible endpoint
n8n community nodes for OpenRouter — full API support including reasoning control, multimodal content (images, video URLs), provider preferences, plugins (web search, web fetch), structured outputs, TTS, STT, and model listing.
n8n community node for SiliconFlow (硅基流动). Zero runtime dependencies. Provides a SiliconFlow action node (Chat / Vision / Embeddings / Image / Rerank / Audio TTS+ASR / Video) and a LangChain-compatible Chat Model node for AI Agents. Installs cleanly witho
A Codex skill that turns explanations into visual, interactive, multimodal learning experiences.
Unified AI media generation SDK with built-in providers and one API for image, video, audio, and 3D workflows.
Google Gemini adapter for llm-ports — native @google/genai SDK integration with bundled pricing, content-block translation, validation repair, and image-size + URL validation.
VoyageAI embeddings integration for Mastra - text, multimodal, and contextualized chunk embeddings
JSON wrapper for OpenAI-style multi-modal API with structured output
Pi image recognition tool: lets text-only models see images via local CLI providers (Codex or Agy).
MCP server and CLI exposing Google's multimodal models (Gemini AI Studio and Vertex AI) for image, video, and audio analysis plus image generation — for Claude Code and other agents.