MCP Catalogs
Homegemini-skill screenshot

gemini-skill

by WJZ-P·813·Score 52

MCP server that automates Google Gemini through browser automation for image generation, chat, and image extraction.

ai-llmbrowser-automationproductivity
121
Forks
7
Open issues
1 mo ago
Last commit
2d ago
Indexed

Overview

Gemini-Skill is a comprehensive MCP server that controls Google Gemini's web interface via Chrome DevTools Protocol (CDP). It provides automated access to Gemini's features including AI image generation with HD downloads, multi-turn conversations, image uploads for generation, and extraction of images from chat sessions. The server operates with a daemon mode that maintains a browser instance in the background, enabling fast subsequent requests without needing to relaunch the browser each time.

Try asking AI

After installing, here are 4 things you can ask your AI assistant:

you:Generate images through text prompts in AI assistants
you:Automate multi-turn conversations with Gemini
you:Extract and download images from chat conversations automatically
you:Integrate Gemini capabilities into AI agent workflows

When to choose this

Choose this MCP server when you need AI-powered image generation through Gemini within your AI agent workflow and require persistent browser automation with stealth capabilities.

When NOT to choose this

Avoid this if you need direct API access to Gemini without browser automation, or if you require multi-browser parallel processing which is not yet supported.

Tools this server exposes

12 tools extracted from the README
  • gemini_generate_imagegemini_generate_image(prompt, newSession, referenceImages, fullSize, timeout)

    Generate an image through Gemini AI with prompt

  • gemini_new_chatgemini_new_chat()

    Start a new blank conversation with Gemini

  • gemini_temp_chatgemini_temp_chat()

    Enter temporary conversation mode with Gemini

  • gemini_switch_modelgemini_switch_model(model)

    Switch between different Gemini models

  • gemini_send_messagegemini_send_message(message, timeout)

    Send a text message to Gemini and wait for a reply

  • gemini_upload_imagesgemini_upload_images(images)

    Upload images to Gemini as reference for image generation

  • gemini_get_imagesgemini_get_images()

    Retrieve all image metadata from the current conversation

  • gemini_extract_imagegemini_extract_image(imageUrl)

    Extract an image as base64 and save it locally

  • gemini_download_full_size_imagegemini_download_full_size_image(index)

    Download the full-size high-resolution version of an image

  • gemini_share_latest_imagegemini_share_latest_image(index, timeout, copyToClipboard, closeDialog)

    Create a public share link for the latest image

  • gemini_get_all_text_responsesgemini_get_all_text_responses()

    Get all text responses from the current conversation

  • gemini_get_latest_text_responsegemini_get_latest_text_response()

    Get the latest text response from Gemini

Comparable tools

google-gemini-api-mcpbrowser-mcppuppeteer-mcp

Installation

Installation

Prerequisites

  • Node.js ≥ 18
  • Chrome/Edge/Chromium browser with Google account logged in

Steps

git clone https://github.com/WJZ-P/gemini-skill.git
cd gemini-skill
npm install

Configuration

Create a .env file in the project root with your configuration:

BROWSER_DEBUG_PORT=40821
BROWSER_HEADLESS=false
DAEMON_TTL_MS=1800000
OUTPUT_DIR=./gemini-image

Claude Desktop Configuration

Add to Claude Desktop's claude_desktop_config.json:

{
  "mcpServers": {
    "gemini": {
      "command": "node",
      "args": ["<absolute_path_to_gemini-skill>/src/mcp-server.js"]
    }
  }
}

Compare gemini-skill with

GitHub →

Last updated · Auto-generated from public README + GitHub signals.