MCP Video Recognition Server
An MCP (Model Context Protocol) server that provides tools for image, audio, and video recognition using Google's Gemini AI.
Features
- Image Recognition: Analyze and describe images using Google Gemini AI
- Audio Recognition: Analyze and transcribe audio using Google Gemini AI
- Video Recognition: Analyze and describe videos using Google Gemini AI
Prerequisites
- Node.js 18 or higher
- Google Gemini API key
Installation
Manual Installation
-
Clone the repository:
git clone https://github.com/yourusername/mcp-video-recognition.git cd mcp-video-recognition -
Install dependencies:
npm install -
Build the project:
npm run build
Installing in FLUJO
- Click Add Server
- Copy & Paste Github URL into FLUJO
- Click Parse, Clone, Install, Build and Save.
Installing via Configuration Files
To integrate this MCP server with Cline or other MCP clients via configuration files:
-
Open your Cline settings:
- In VS Code, go to File -> Preferences -> Settings
- Search for "Cline MCP Settings"
- Click "Edit in settings.json"
-
Add the server configuration to the
mcpServersobject:{ "mcpServers": { "video-recognition": { "command": "node", "args": [ "/path/to/mcp-video-recognition/dist/index.js" ], "disabled": false, "autoApprove": [] } } } -
Replace
/path/to/mcp-video-recognition/dist/index.jswith the actual path to theindex.jsfile in your project directory. Use forward slashes (/) or double backslashes (\\\\) for the path on Windows. -
Save the settings file. Cline should automatically connect to the server.
Configuration
The server is configured using environment variables:
GOOGLE_API_KEY(required): Your Google Gemini API keyTRANSPORT_TYPE: Transport type to use (stdioorsse, defaults tostdio)PORT: Port number for SSE transport (defaults to 3000)LOG_LEVEL: Logging level (verbose,debug,info,warn,error, defaults toinfo)
Usage
Starting the Server
With stdio Transport (Default)
GOOGLE_API_KEY=your_api_key npm start
With SSE Transport
GOOGLE_API_KEY=your_api_key TRANSPORT_TYPE=sse PORT=3000 npm start
Using the Tools
The server provides three tools that can be called by MCP clients:
Image Recognition
{
"name": "image_recognition",
"arguments": {
"filepath": "/path/to/image.jpg",
"prompt": "Describe this image in detail",
"modelname": "gemini-2.0-flash"
}
}
Audio Recognition
{
"name": "audio_recognition",
"arguments": {
"filepath": "/path/to/audio.mp3",
"prompt": "Transcribe this audio",
"modelname": "gemini-2.0-flash"
}
}
Video Recognition
{
"name": "video_recognition",
"arguments": {
"filepath": "/path/to/video.mp4",
"prompt": "Describe what happens in this video",
"modelname": "gemini-2.0-flash"
}
}
Tool Parameters
All tools accept the following parameters:
filepath(required): Path to the media file to analyzeprompt(optional): Custom prompt for the recognition (defaults to "Describe this content")modelname(optional): Gemini model to use for recognition (defaults to "gemini-2.0-flash")
Development
Running in Development Mode
GOOGLE_API_KEY=your_api_key npm run dev
Project Structure
src/index.ts: Entry pointsrc/server.ts: MCP server implementationsrc/tools/: Tool implementationssrc/services/: Service implementations (Gemini API)src/types/: Type definitionssrc/utils/: Utility functions
License
MIT
Recommend MCP Servers 💡
@mindpilot/mcp
Visualize code architecture and flows for AI agents via MCP server with local processing.
rohans2/mcp-google-sheets
TypeScript MCP server for AI agents to interact with Google Sheets via tools
mcp-claude-weather
An MCP server providing real-time weather alerts, forecasts, and warnings from the US National Weather Service via stdio transport.
jobspy-mcp-server
An MCP server enabling AI assistants to search jobs across platforms like Indeed and LinkedIn using JobSpy
supadata-mcp
A Model Context Protocol (MCP) server that integrates with Supadata.ai to provide powerful video transcript extraction, web scraping, crawling, and discovery capabilities for LLM clients like Cursor and Claude.
arxiv-search-mcp
An MCP server that provides tools to search and fetch papers from arXiv.org.