
mcp-apache-spark-history-server
by kubeflow·★ 170·Score 50
MCP Server connecting AI agents to Apache Spark History Server for job analysis and performance monitoring.
Overview
This is a dual-interface toolkit that provides both an MCP Server for AI agents and a CLI tool for engineers to interact with Spark History Server data. The MCP Server exposes 21 tools for Spark application investigation, including job analysis, stage metrics, executor information, and SQL query analysis. It supports multiple Spark History Server configurations and provides features for comparative analysis and performance bottleneck detection. The project is actively maintained by Kubeflow with regular updates and comprehensive documentation.
Try asking AI
After installing, here are 5 things you can ask your AI assistant:
When to choose this
Choose this MCP server if you're using Apache Spark and want AI agents to analyze cluster performance, debug applications, or compare job runs through natural language queries.
When NOT to choose this
Don't choose this if you're not using Apache Spark, need real-time streaming analytics, or require access to non-Hadoop ecosystem tools.
Tools this server exposes
12 tools extracted from the READMElist_applicationsList applications with optional status, date, and limit filters
get_applicationGet application detail: status, resources, duration, attempts
list_jobsList jobs with status filtering
list_slowest_jobsTop N slowest jobs
list_stagesList stages with status filtering
get_stageStage detail with attempt and summary metrics
get_executorExecutor detail: resources, task stats, performance
get_sql_executionSQL execution detail with optional plan and node metrics
compare_job_performanceDiff performance metrics between two applications
get_job_bottlenecksIdentify bottlenecks across stages, tasks, and executors
aws_analyze_spark_workloadOne-shot root cause analysis of failed/slow Spark workloads
list_slowest_sql_queriesTop N slowest SQL executions with metrics
Comparable tools
Installation
Install with pip:
pip install mcp-apache-spark-history-server
spark-mcpRun directly with uvx (no install needed):
uvx --from mcp-apache-spark-history-server spark-mcpConfiguration via config.yaml (supports multiple servers):
servers:
local:
default: true
url: "http://your-spark-history-server:18080"
auth:
username: "user"
password: "pass"
mcp:
transports:
- streamable-http
port: "18888"Claude Desktop configuration:
{
"mcpServers": {
"spark": {
"command": "spark-mcp",
"args": []
}
}
}FAQ
- What's the difference between the MCP Server and the CLI tool?
- The MCP Server is designed for AI agents to interact with Spark History Server using natural language, while the CLI (shs) is a standalone Go binary for direct terminal access, scripts, and CI/CD pipelines.
- Does this support AWS EMR and Glue?
- Yes, the project includes specific examples for AWS Glue and Amazon EMR integration, plus optional AWS Spark troubleshooting features for root cause analysis and code recommendations.
Compare mcp-apache-spark-history-server with
Last updated · Auto-generated from public README + GitHub signals.