gemini-skill

Name: gemini-skill
Rating: 2.6 (813 reviews)
Author: WJZ-P

by WJZ-P·★ 813·Score 52

MCP server that automates Google Gemini through browser automation for image generation, chat, and image extraction.

ai-llmbrowser-automationproductivity

121

Forks

Open issues

3 mo ago

Last commit

56d ago

Indexed

Overview

Gemini-Skill is a comprehensive MCP server that controls Google Gemini's web interface via Chrome DevTools Protocol (CDP). It provides automated access to Gemini's features including AI image generation with HD downloads, multi-turn conversations, image uploads for generation, and extraction of images from chat sessions. The server operates with a daemon mode that maintains a browser instance in the background, enabling fast subsequent requests without needing to relaunch the browser each time.

Try asking AI

After installing, here are 4 things you can ask your AI assistant:

you:Generate images through text prompts in AI assistants

you:Automate multi-turn conversations with Gemini

you:Extract and download images from chat conversations automatically

you:Integrate Gemini capabilities into AI agent workflows

When to choose this

Choose this MCP server when you need AI-powered image generation through Gemini within your AI agent workflow and require persistent browser automation with stealth capabilities.

When NOT to choose this

Avoid this if you need direct API access to Gemini without browser automation, or if you require multi-browser parallel processing which is not yet supported.

Tools this server exposes

12 tools extracted from the README

gemini_generate_imagegemini_generate_image(prompt, newSession, referenceImages, fullSize, timeout)
Generate an image through Gemini AI with prompt
gemini_new_chatgemini_new_chat()
Start a new blank conversation with Gemini
gemini_temp_chatgemini_temp_chat()
Enter temporary conversation mode with Gemini
gemini_switch_modelgemini_switch_model(model)
Switch between different Gemini models
gemini_send_messagegemini_send_message(message, timeout)
Send a text message to Gemini and wait for a reply
gemini_upload_imagesgemini_upload_images(images)
Upload images to Gemini as reference for image generation
gemini_get_imagesgemini_get_images()
Retrieve all image metadata from the current conversation
gemini_extract_imagegemini_extract_image(imageUrl)
Extract an image as base64 and save it locally
gemini_download_full_size_imagegemini_download_full_size_image(index)
Download the full-size high-resolution version of an image
gemini_share_latest_imagegemini_share_latest_image(index, timeout, copyToClipboard, closeDialog)
Create a public share link for the latest image
gemini_get_all_text_responsesgemini_get_all_text_responses()
Get all text responses from the current conversation
gemini_get_latest_text_responsegemini_get_latest_text_response()
Get the latest text response from Gemini

Comparable tools

google-gemini-api-mcpbrowser-mcppuppeteer-mcp

Installation

Prerequisites

Node.js ≥ 18
Chrome/Edge/Chromium browser with Google account logged in

Steps

git clone https://github.com/WJZ-P/gemini-skill.git
cd gemini-skill
npm install

Configuration

Create a .env file in the project root with your configuration:

BROWSER_DEBUG_PORT=40821
BROWSER_HEADLESS=false
DAEMON_TTL_MS=1800000
OUTPUT_DIR=./gemini-image

Claude Desktop Configuration

Add to Claude Desktop's claude_desktop_config.json:

{
  "mcpServers": {
    "gemini": {
      "command": "node",
      "args": ["<absolute_path_to_gemini-skill>/src/mcp-server.js"]
    }
  }
}

Compare gemini-skill with

gemini-skill vs ultimate_mcp_server gemini-skill vs mcp-server-chart gemini-skill vs everything gemini-skill vs filesystem gemini-skill vs time

GitHub →

Last updated 2026-05-17 · Auto-generated from public README + GitHub signals.