gemini-vision-art-studio

@falahgs

A powerful MCP server leveraging Google's Gemini AI for advanced image generation and transformation. This studio offers two specialized tools: a 3D cartoon generator and an image processing transformer, both powered by the cutting-edge Gemini 2.0 Flash model.

gemini

image-generation

image-transformation

3d-cartoon

🎨 Gemini Vision Art Studio

✨ Features

1. 3D Cartoon Generator

Generate high-quality 3D cartoon images from text descriptions
Child-friendly designs with vibrant colors and engaging visuals
Perfect for children's books, educational materials, and creative projects

2. Image Transformer

Transform existing images using Gemini AI's vision capabilities
Apply various artistic styles and modifications
Enhance, modify, or completely reimagine your images

Additional Features

🖼️ Automatic preview generation
🌐 Browser-based image viewing
💾 Local storage with organized output
🔄 Real-time processing
📱 Cross-platform support

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/falahgs/gemini-vision-art-studio.git

# Install dependencies
cd gemini-vision-art-studio
npm install

Configuration

Project Configuration: Create a .env file in the root directory:

GEMINI_API_KEY=your_api_key_here
# Set to true if running in a remote environment (no browser preview)
IS_REMOTE=true

Claude Desktop Configuration: Add the server configuration to your Claude Desktop config file at %AppData%\\Claude\\claude_desktop_config.json:

{
  "mcpServers": {
    "gemini-vision-art-studio": {
      "command": "node",
      "args": [
        "PATH_TO_YOUR_PROJECT\\\\build\\\\src\\\\index.js"
      ],
      "env": {
        "GEMINI_API_KEY": "your_gemini_api_key_here",
        "IS_REMOTE": "true"
      }
    }
  }
}

Replace:

PATH_TO_YOUR_PROJECT with your actual project path
your_gemini_api_key_here with your Gemini API key

💡 Note: On Windows, the config file is typically located at: C:\\Users\\YourUsername\\AppData\\Roaming\\Claude\\claude_desktop_config.json

Remote Usage

When running the server remotely:

Set IS_REMOTE=true in your environment or Claude Desktop configuration
The server will:
- Create necessary directories automatically:
  - /app/output: For generated images and previews
  - /app/temp: For temporary processing files
- Skip browser preview attempts
- Save all files to the /app/output directory
- Return absolute file paths in the response

Directory Structure in Remote Mode:

/app/
├── output/           # Generated images and previews
│   ├── image1.png
│   └── image1_preview.html
└── temp/            # Temporary processing files

Troubleshooting Remote Usage:
- Ensure the /app directory exists and is writable
- Check the console output for directory creation messages
- Look for "Image saved to:" messages in the logs
- File paths in the response will be absolute paths

Running the Server

Build the project:

npm run build

The server will be available in Claude Desktop automatically when you:
- Open Claude Desktop
- Start a new conversation
- The tools will appear in the available tools list

🛠️ Available Tools

1. Generate 3D Cartoon (`generate_3d_cartoon`)

Creates a 3D-style cartoon image from your text description.

{
  "name": "generate_3d_cartoon",
  "arguments": {
    "prompt": "A friendly dragon teaching math to forest animals",
    "fileName": "dragon_teacher"
  }
}

2. Process Image (`process_image`)

Transforms existing images according to your instructions.

{
  "name": "process_image",
  "arguments": {
    "imagePath": "input/photo.jpg",
    "prompt": "Transform this into a watercolor painting with autumn colors",
    "outputFileName": "watercolor_autumn"
  }
}

📂 Directory Structure

gemini-vision-art-studio/
├── src/               # Source code
├── build/            # Compiled code
├── input/            # Input images
├── output/           # Generated images and previews
├── temp/             # Temporary processing files
└── examples/         # Example usage and images

🔧 Technical Details

Runtime: Node.js v14+
Language: TypeScript 5.8.3
AI Model: Gemini 2.0 Flash
Framework: Model Context Protocol (MCP) SDK
Image Processing: Google Generative AI

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Falah G. Salieh

Copyright © 2025
GitHub: @falahgs

🙏 Acknowledgments

Google Gemini AI team for the powerful image generation model
The MCP SDK team for the excellent tooling
All contributors and users of this project

Made with ❤️ by Falah G. Salieh

Transport:

stdio

Language:

Created: 4/18/2025

Updated: 4/18/2025

Homepage:

https://github.com/falahgs/gemini-vision-art-studio

Recommend MCP Servers 💡

winx-code-agent

gabrielmaialva33

✨ A high-performance code agent written in Rust, combining the best features of WCGW for maximum efficiency and semantic capabilities. 🦀

2025-04-17

@MeasureSpace/measure-space-mcp-server

MeasureSpace

An MCP server providing weather, climate, air quality forecasts, and geocoding services from measurespace.io, built with FastAPI.

2025-06-19

sse

duckdb-hybrid-doc-search

upamune

A tool for hybrid indexing of Markdown documents using DuckDB for full-text and vector search, designed to be callable by AI coding agents as an MCP stdio or streamable server.

2025-04-28

datalayer/jupyter-earth-mcp-server

datalayer

MCP server implementation providing geospatial analysis tools for Jupyter notebooks

2025-04-09

octocode-mcp

bgauryy

An MCP server for semantic code research and context generation, enabling natural language search across codebases and transforming them into AI-optimized knowledge.

2025-06-05

pmat

paiml

A zero-configuration AI context generation system that analyzes any codebase instantly through CLI, MCP, or HTTP interfaces, designed to make code with agents more deterministic.

2025-05-25

gemini-vision-art-studio

🎨 Gemini Vision Art Studio

✨ Features

1. 3D Cartoon Generator

2. Image Transformer

Additional Features

🚀 Quick Start

Installation

Configuration

Remote Usage

Running the Server

🛠️ Available Tools

1. Generate 3D Cartoon (`generate_3d_cartoon`)

2. Process Image (`process_image`)

📂 Directory Structure

🔧 Technical Details

🤝 Contributing

📝 License

👨‍💻 Author

🙏 Acknowledgments

# `mcpServer` Config

# stdio

Recommend MCP Servers 💡

winx-code-agent

@MeasureSpace/measure-space-mcp-server

duckdb-hybrid-doc-search

datalayer/jupyter-earth-mcp-server

octocode-mcp

pmat

gemini-vision-art-studio

🎨 Gemini Vision Art Studio

✨ Features

1. 3D Cartoon Generator

2. Image Transformer

Additional Features

🚀 Quick Start

Installation

Configuration

Remote Usage

Running the Server

🛠️ Available Tools

1. Generate 3D Cartoon (generate_3d_cartoon)

2. Process Image (process_image)

📂 Directory Structure

🔧 Technical Details

🤝 Contributing

📝 License

👨‍💻 Author

🙏 Acknowledgments

# mcpServer Config

# stdio

Recommend MCP Servers 💡

winx-code-agent

@MeasureSpace/measure-space-mcp-server

duckdb-hybrid-doc-search

datalayer/jupyter-earth-mcp-server

octocode-mcp

pmat

1. Generate 3D Cartoon (`generate_3d_cartoon`)

2. Process Image (`process_image`)

# `mcpServer` Config