mcp-apache-spark-history-server

Name: mcp-apache-spark-history-server
Rating: 2.5 (170 reviews)
Author: kubeflow

by kubeflow·★ 170·Score 50

MCP Server connecting AI agents to Apache Spark History Server for job analysis and performance monitoring.

developer-toolsmonitoringai-llm

Forks

Open issues

2 mo ago

Last commit

56d ago

Indexed

Overview

This is a dual-interface toolkit that provides both an MCP Server for AI agents and a CLI tool for engineers to interact with Spark History Server data. The MCP Server exposes 21 tools for Spark application investigation, including job analysis, stage metrics, executor information, and SQL query analysis. It supports multiple Spark History Server configurations and provides features for comparative analysis and performance bottleneck detection. The project is actively maintained by Kubeflow with regular updates and comprehensive documentation.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:AI agents investigating failed or slow Spark applications using natural language queries

you:Comparing performance metrics between different Spark job runs to identify regressions

you:Automating Spark job monitoring and alerting through integration with AI agents

you:What's the difference between the MCP Server and the CLI tool?

you:Does this support AWS EMR and Glue?

When to choose this

Choose this MCP server if you're using Apache Spark and want AI agents to analyze cluster performance, debug applications, or compare job runs through natural language queries.

When NOT to choose this

Don't choose this if you're not using Apache Spark, need real-time streaming analytics, or require access to non-Hadoop ecosystem tools.

Tools this server exposes

12 tools extracted from the README

list_applications
List applications with optional status, date, and limit filters
get_application
Get application detail: status, resources, duration, attempts
list_jobs
List jobs with status filtering
list_slowest_jobs
Top N slowest jobs
list_stages
List stages with status filtering
get_stage
Stage detail with attempt and summary metrics
get_executor
Executor detail: resources, task stats, performance
get_sql_execution
SQL execution detail with optional plan and node metrics
compare_job_performance
Diff performance metrics between two applications
get_job_bottlenecks
Identify bottlenecks across stages, tasks, and executors
aws_analyze_spark_workload
One-shot root cause analysis of failed/slow Spark workloads
list_slowest_sql_queries
Top N slowest SQL executions with metrics

Comparable tools

spark-monitoring-toolsmcp-prometheusmcp-grafanacustom-spark-rest-api-client

Installation

Install with pip:

pip install mcp-apache-spark-history-server
spark-mcp

Run directly with uvx (no install needed):

uvx --from mcp-apache-spark-history-server spark-mcp

Configuration via config.yaml (supports multiple servers):

servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:
      username: "user"
      password: "pass"
mcp:
  transports:
    - streamable-http
  port: "18888"

Claude Desktop configuration:

{
  "mcpServers": {
    "spark": {
      "command": "spark-mcp",
      "args": []
    }
  }
}

FAQ

What's the difference between the MCP Server and the CLI tool?: The MCP Server is designed for AI agents to interact with Spark History Server using natural language, while the CLI (shs) is a standalone Go binary for direct terminal access, scripts, and CI/CD pipelines.
Does this support AWS EMR and Glue?: Yes, the project includes specific examples for AWS Glue and Amazon EMR integration, plus optional AWS Spark troubleshooting features for root cause analysis and code recommendations.

Compare mcp-apache-spark-history-server with

mcp-apache-spark-history-server vs ultimate_mcp_server mcp-apache-spark-history-server vs mcp-server-chart mcp-apache-spark-history-server vs everything mcp-apache-spark-history-server vs filesystem mcp-apache-spark-history-server vs time

GitHub →

Last updated 2026-05-17 · Auto-generated from public README + GitHub signals.