gemini-vision-art-studio
A powerful MCP server leveraging Google's Gemini AI for advanced image generation and transformation. This studio offers two specialized tools: a 3D cartoon generator and an image processing transformer, both powered by the cutting-edge Gemini 2.0 Flash model.
🎨 Gemini Vision Art Studio
A powerful MCP server leveraging Google's Gemini AI for advanced image generation and transformation. This studio offers two specialized tools: a 3D cartoon generator and an image processing transformer, both powered by the cutting-edge Gemini 2.0 Flash model.
✨ Features
1. 3D Cartoon Generator
- Generate high-quality 3D cartoon images from text descriptions
- Child-friendly designs with vibrant colors and engaging visuals
- Perfect for children's books, educational materials, and creative projects
2. Image Transformer
- Transform existing images using Gemini AI's vision capabilities
- Apply various artistic styles and modifications
- Enhance, modify, or completely reimagine your images
Additional Features
- 🖼️ Automatic preview generation
- 🌐 Browser-based image viewing
- 💾 Local storage with organized output
- 🔄 Real-time processing
- 📱 Cross-platform support
🚀 Quick Start
Installation
# Clone the repository
git clone https://github.com/falahgs/gemini-vision-art-studio.git
# Install dependencies
cd gemini-vision-art-studio
npm install
Configuration
- Project Configuration:
Create a
.envfile in the root directory:
GEMINI_API_KEY=your_api_key_here
# Set to true if running in a remote environment (no browser preview)
IS_REMOTE=true
- Claude Desktop Configuration:
Add the server configuration to your Claude Desktop config file at
%AppData%\\Claude\\claude_desktop_config.json:
{
"mcpServers": {
"gemini-vision-art-studio": {
"command": "node",
"args": [
"PATH_TO_YOUR_PROJECT\\\\build\\\\src\\\\index.js"
],
"env": {
"GEMINI_API_KEY": "your_gemini_api_key_here",
"IS_REMOTE": "true"
}
}
}
}
Replace:
PATH_TO_YOUR_PROJECTwith your actual project pathyour_gemini_api_key_herewith your Gemini API key
💡 Note: On Windows, the config file is typically located at:
C:\\Users\\YourUsername\\AppData\\Roaming\\Claude\\claude_desktop_config.json
Remote Usage
When running the server remotely:
-
Set
IS_REMOTE=truein your environment or Claude Desktop configuration -
The server will:
- Create necessary directories automatically:
/app/output: For generated images and previews/app/temp: For temporary processing files
- Skip browser preview attempts
- Save all files to the
/app/outputdirectory - Return absolute file paths in the response
- Create necessary directories automatically:
-
Directory Structure in Remote Mode:
/app/ ├── output/ # Generated images and previews │ ├── image1.png │ └── image1_preview.html └── temp/ # Temporary processing files -
Troubleshooting Remote Usage:
- Ensure the
/appdirectory exists and is writable - Check the console output for directory creation messages
- Look for "Image saved to:" messages in the logs
- File paths in the response will be absolute paths
- Ensure the
Running the Server
- Build the project:
npm run build
- The server will be available in Claude Desktop automatically when you:
- Open Claude Desktop
- Start a new conversation
- The tools will appear in the available tools list
🛠️ Available Tools
1. Generate 3D Cartoon (generate_3d_cartoon)
Creates a 3D-style cartoon image from your text description.
{
"name": "generate_3d_cartoon",
"arguments": {
"prompt": "A friendly dragon teaching math to forest animals",
"fileName": "dragon_teacher"
}
}
2. Process Image (process_image)
Transforms existing images according to your instructions.
{
"name": "process_image",
"arguments": {
"imagePath": "input/photo.jpg",
"prompt": "Transform this into a watercolor painting with autumn colors",
"outputFileName": "watercolor_autumn"
}
}
📂 Directory Structure
gemini-vision-art-studio/
├── src/ # Source code
├── build/ # Compiled code
├── input/ # Input images
├── output/ # Generated images and previews
├── temp/ # Temporary processing files
└── examples/ # Example usage and images
🔧 Technical Details
- Runtime: Node.js v14+
- Language: TypeScript 5.8.3
- AI Model: Gemini 2.0 Flash
- Framework: Model Context Protocol (MCP) SDK
- Image Processing: Google Generative AI
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
📝 License
This project is licensed under the MIT License - see the LICENSE file for details.
👨💻 Author
Falah G. Salieh
- Copyright © 2025
- GitHub: @falahgs
🙏 Acknowledgments
- Google Gemini AI team for the powerful image generation model
- The MCP SDK team for the excellent tooling
- All contributors and users of this project
Made with ❤️ by Falah G. Salieh
Recommend MCP Servers 💡
winx-code-agent
✨ A high-performance code agent written in Rust, combining the best features of WCGW for maximum efficiency and semantic capabilities. 🦀
@MeasureSpace/measure-space-mcp-server
An MCP server providing weather, climate, air quality forecasts, and geocoding services from measurespace.io, built with FastAPI.
duckdb-hybrid-doc-search
A tool for hybrid indexing of Markdown documents using DuckDB for full-text and vector search, designed to be callable by AI coding agents as an MCP stdio or streamable server.
datalayer/jupyter-earth-mcp-server
MCP server implementation providing geospatial analysis tools for Jupyter notebooks
octocode-mcp
An MCP server for semantic code research and context generation, enabling natural language search across codebases and transforming them into AI-optimized knowledge.
pmat
A zero-configuration AI context generation system that analyzes any codebase instantly through CLI, MCP, or HTTP interfaces, designed to make code with agents more deterministic.