MCP Video Recognition Server
An MCP (Model Context Protocol) server that provides tools for image, audio, and video recognition using Google's Gemini AI.
Features
- Image Recognition: Analyze and describe images using Google Gemini AI
- Audio Recognition: Analyze and transcribe audio using Google Gemini AI
- Video Recognition: Analyze and describe videos using Google Gemini AI
Prerequisites
- Node.js 18 or higher
- Google Gemini API key
Installation
Manual Installation
-
Clone the repository:
git clone https://github.com/yourusername/mcp-video-recognition.git cd mcp-video-recognition -
Install dependencies:
npm install -
Build the project:
npm run build
Installing in FLUJO
- Click Add Server
- Copy & Paste Github URL into FLUJO
- Click Parse, Clone, Install, Build and Save.
Installing via Configuration Files
To integrate this MCP server with Cline or other MCP clients via configuration files:
-
Open your Cline settings:
- In VS Code, go to File -> Preferences -> Settings
- Search for "Cline MCP Settings"
- Click "Edit in settings.json"
-
Add the server configuration to the
mcpServersobject:{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } } -
Replace
/path/to/mcp-video-recognition/dist/index.jswith the actual path to theindex.jsfile in your project directory. Use forward slashes (/) or double backslashes (\\\\) for the path on Windows. -
Save the settings file. Cline should automatically connect to the server.
Configuration
The server is configured using environment variables:
GOOGLE_API_KEY(required): Your Google Gemini API keyTRANSPORT_TYPE: Transport type to use (stdioorsse, defaults tostdio)PORT: Port number for SSE transport (defaults to 3000)LOG_LEVEL: Logging level (verbose,debug,info,warn,error, defaults toinfo)
Usage
Starting the Server
With stdio Transport (Default)
GOOGLE_API_KEY=your_api_key npm start
With SSE Transport
GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start
Using the Tools
The server provides three tools that can be called by MCP clients:
Image Recognition
{
"name": "image_recognition",
"arguments": {
"filepath": "/path/to/image.jpg",
"prompt": "Describe this image in detail",
"modelname": "gemini-2.0-flash"
}
}
Audio Recognition
{
"name": "audio_recognition",
"arguments": {
"filepath": "/path/to/audio.mp3",
"prompt": "Transcribe this audio",
"modelname": "gemini-2.0-flash"
}
}
Video Recognition
{
"name": "video_recognition",
"arguments": {
"filepath": "/path/to/video.mp4",
"prompt": "Describe what happens in this video",
"modelname": "gemini-2.0-flash"
}
}
Tool Parameters
All tools accept the following parameters:
filepath(required): Path to the media file to analyzeprompt(optional): Custom prompt for the recognition (defaults to "Describe this content")modelname(optional): Gemini model to use for recognition (defaults to "gemini-2.0-flash")
Development
Running in Development Mode
GOOGLE_API_KEY=your_api_key npm run dev
Project Structure
src/index.ts: Entry pointsrc/server.ts: MCP server implementationsrc/tools/: Tool implementationssrc/services/: Service implementations (Gemini API)src/types/: Type definitionssrc/utils/: Utility functions
License
MIT
Recommend MCP Servers 💡
fulcra-context-mcp
An MCP server for accessing Fulcra Context data via the Fulcra API
shadow-cljs-mcp
An MCP server that monitors shadow-cljs builds and provides real-time build status updates, allowing LLMs to verify build status after making changes to ClojureScript files.
@notionhq/notion-mcp-server
Official MCP Server for Notion API integration, enabling AI agents to interact with Notion content
NeoCoder-neo4j-ai-workflow
An MCP server that enables AI assistants to use Neo4j knowledge graphs and Qdrant vector databases for hybrid reasoning and workflow management.
mcp-telemetry
A Model Context Protocol (MCP) server for telemetry within chat systems, enabling tracing and analysis of conversations using Weights & Biases Weave.
@r-huijts/oorlogsbronnen-mcp
MCP server for accessing Dutch World War II archives through the Oorlogsbronnen API. Provides structured access to historical records, photographs, and documents from 1940-1945 Netherlands.