gemini-vision-art-studio
A powerful MCP server leveraging Google's Gemini AI for advanced image generation and transformation. This studio offers two specialized tools: a 3D cartoon generator and an image processing transformer, both powered by the cutting-edge Gemini 2.0 Flash model.
🎨 Gemini Vision Art Studio
A powerful MCP server leveraging Google's Gemini AI for advanced image generation and transformation. This studio offers two specialized tools: a 3D cartoon generator and an image processing transformer, both powered by the cutting-edge Gemini 2.0 Flash model.
✨ Features
1. 3D Cartoon Generator
- Generate high-quality 3D cartoon images from text descriptions
- Child-friendly designs with vibrant colors and engaging visuals
- Perfect for children's books, educational materials, and creative projects
2. Image Transformer
- Transform existing images using Gemini AI's vision capabilities
- Apply various artistic styles and modifications
- Enhance, modify, or completely reimagine your images
Additional Features
- 🖼️ Automatic preview generation
- 🌐 Browser-based image viewing
- 💾 Local storage with organized output
- 🔄 Real-time processing
- 📱 Cross-platform support
🚀 Quick Start
Installation
# Clone the repository
git clone https://github.com/falahgs/gemini-vision-art-studio.git
# Install dependencies
cd gemini-vision-art-studio
npm install
Configuration
- Project Configuration:
Create a
.envfile in the root directory:
GEMINI_API_KEY=your_api_key_here
# Set to true if running in a remote environment (no browser preview)
IS_REMOTE=true
- Claude Desktop Configuration:
Add the server configuration to your Claude Desktop config file at
%AppData%\\Claude\\claude_desktop_config.json:
{
"mcpServers": {
"gemini-vision-art-studio": {
"command": "node",
"args": [
"PATH_TO_YOUR_PROJECT\\\\build\\\\src\\\\index.js"
],
"env": {
"GEMINI_API_KEY": "your_gemini_api_key_here",
"IS_REMOTE": "true"
}
}
}
}
Replace:
PATH_TO_YOUR_PROJECTwith your actual project pathyour_gemini_api_key_herewith your Gemini API key
💡 Note: On Windows, the config file is typically located at:
C:\\Users\\YourUsername\\AppData\\Roaming\\Claude\\claude_desktop_config.json
Remote Usage
When running the server remotely:
-
Set
IS_REMOTE=truein your environment or Claude Desktop configuration -
The server will:
- Create necessary directories automatically:
/app/output: For generated images and previews/app/temp: For temporary processing files
- Skip browser preview attempts
- Save all files to the
/app/outputdirectory - Return absolute file paths in the response
- Create necessary directories automatically:
-
Directory Structure in Remote Mode:
/app/ ├── output/ # Generated images and previews │ ├── image1.png │ └── image1_preview.html └── temp/ # Temporary processing files -
Troubleshooting Remote Usage:
- Ensure the
/appdirectory exists and is writable - Check the console output for directory creation messages
- Look for "Image saved to:" messages in the logs
- File paths in the response will be absolute paths
- Ensure the
Running the Server
- Build the project:
npm run build
- The server will be available in Claude Desktop automatically when you:
- Open Claude Desktop
- Start a new conversation
- The tools will appear in the available tools list
🛠️ Available Tools
1. Generate 3D Cartoon (generate_3d_cartoon)
Creates a 3D-style cartoon image from your text description.
{
"name": "generate_3d_cartoon",
"arguments": {
"prompt": "A friendly dragon teaching math to forest animals",
"fileName": "dragon_teacher"
}
}
2. Process Image (process_image)
Transforms existing images according to your instructions.
{
"name": "process_image",
"arguments": {
"imagePath": "input/photo.jpg",
"prompt": "Transform this into a watercolor painting with autumn colors",
"outputFileName": "watercolor_autumn"
}
}
📂 Directory Structure
gemini-vision-art-studio/
├── src/ # Source code
├── build/ # Compiled code
├── input/ # Input images
├── output/ # Generated images and previews
├── temp/ # Temporary processing files
└── examples/ # Example usage and images
🔧 Technical Details
- Runtime: Node.js v14+
- Language: TypeScript 5.8.3
- AI Model: Gemini 2.0 Flash
- Framework: Model Context Protocol (MCP) SDK
- Image Processing: Google Generative AI
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
👨💻 Author
Falah G. Salieh
- Copyright © 2025
- GitHub: @falahgs
🙏 Acknowledgments
- Google Gemini AI team for the powerful image generation model
- The MCP SDK team for the excellent tooling
- All contributors and users of this project
Made with ❤️ by Falah G. Salieh
Recommend MCP Servers 💡
caldav-mcp
A CalDAV client that exposes calendar operations (create, list events) as tools for AI assistants via Model Context Protocol.
mcp-pinecone
Model Context Protocol server to allow for reading and writing from Pinecone. Rudimentary RAG
dpml-prompt
基于MCP协议的AI上下文工程平台,提供角色创建、记忆管理等功能,支持通过自然语言交互激活专业角色
sungithubid/mcp-url2markdown
Converts web page content from a given URL into clean, formatted markdown using crawl4ai.
bugsnag-mcp-server
A Model Context Protocol (MCP) server for interacting with Bugsnag. This server allows LLM tools like Cursor and Claude to investigate and resolve issues in Bugsnag.
kagimcp
The Official Model Context Protocol (MCP) server for Kagi search & other tools.