MCP Catalogs
Homemcp-apache-spark-history-server screenshot

mcp-apache-spark-history-server

by kubeflow·170·Score 50

MCP Server connecting AI agents to Apache Spark History Server for job analysis and performance monitoring.

developer-toolsmonitoringai-llm
60
Forks
24
Open issues
this month
Last commit
2d ago
Indexed

Overview

This is a dual-interface toolkit that provides both an MCP Server for AI agents and a CLI tool for engineers to interact with Spark History Server data. The MCP Server exposes 21 tools for Spark application investigation, including job analysis, stage metrics, executor information, and SQL query analysis. It supports multiple Spark History Server configurations and provides features for comparative analysis and performance bottleneck detection. The project is actively maintained by Kubeflow with regular updates and comprehensive documentation.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:AI agents investigating failed or slow Spark applications using natural language queries
you:Comparing performance metrics between different Spark job runs to identify regressions
you:Automating Spark job monitoring and alerting through integration with AI agents
you:What's the difference between the MCP Server and the CLI tool?
you:Does this support AWS EMR and Glue?

When to choose this

Choose this MCP server if you're using Apache Spark and want AI agents to analyze cluster performance, debug applications, or compare job runs through natural language queries.

When NOT to choose this

Don't choose this if you're not using Apache Spark, need real-time streaming analytics, or require access to non-Hadoop ecosystem tools.

Tools this server exposes

12 tools extracted from the README
  • list_applications

    List applications with optional status, date, and limit filters

  • get_application

    Get application detail: status, resources, duration, attempts

  • list_jobs

    List jobs with status filtering

  • list_slowest_jobs

    Top N slowest jobs

  • list_stages

    List stages with status filtering

  • get_stage

    Stage detail with attempt and summary metrics

  • get_executor

    Executor detail: resources, task stats, performance

  • get_sql_execution

    SQL execution detail with optional plan and node metrics

  • compare_job_performance

    Diff performance metrics between two applications

  • get_job_bottlenecks

    Identify bottlenecks across stages, tasks, and executors

  • aws_analyze_spark_workload

    One-shot root cause analysis of failed/slow Spark workloads

  • list_slowest_sql_queries

    Top N slowest SQL executions with metrics

Comparable tools

spark-monitoring-toolsmcp-prometheusmcp-grafanacustom-spark-rest-api-client

Installation

Install with pip:

pip install mcp-apache-spark-history-server
spark-mcp

Run directly with uvx (no install needed):

uvx --from mcp-apache-spark-history-server spark-mcp

Configuration via config.yaml (supports multiple servers):

servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:
      username: "user"
      password: "pass"
mcp:
  transports:
    - streamable-http
  port: "18888"

Claude Desktop configuration:

{
  "mcpServers": {
    "spark": {
      "command": "spark-mcp",
      "args": []
    }
  }
}

FAQ

What's the difference between the MCP Server and the CLI tool?
The MCP Server is designed for AI agents to interact with Spark History Server using natural language, while the CLI (shs) is a standalone Go binary for direct terminal access, scripts, and CI/CD pipelines.
Does this support AWS EMR and Glue?
Yes, the project includes specific examples for AWS Glue and Amazon EMR integration, plus optional AWS Spark troubleshooting features for root cause analysis and code recommendations.

Compare mcp-apache-spark-history-server with

GitHub →

Last updated · Auto-generated from public README + GitHub signals.