MCP-PDF-Extractor-server

Name: MCP-PDF-Extractor-server
Rating: 1.6 (1 reviews)
Author: RayenMalouche

by RayenMalouche·★ 0·Score 33

Java-based MCP server using Apache Tika to extract content and metadata from PDFs, DOCX and other documents.

file-systemdeveloper-toolsai-llm

Forks

Open issues

11 mo ago

Last commit

56d ago

Indexed

Overview

The Tika MCP Extractor Server is a comprehensive Java implementation that provides MCP-compliant tools for document extraction. It supports multiple formats including PDF, DOCX, TXT, HTML and images, converting content to HTML with embedded CSS or plain text. The server exposes four main tools: extract-to-html, extract-text, list-available-files, and get-file-metadata, all while maintaining robust error handling and comprehensive logging. Built with Spring Boot and Jetty, it offers both MCP protocol compliance and REST endpoints for testing and integration.

Try asking AI

After installing, here are 6 things you can ask your AI assistant:

you:Processing and extracting content from local documents in secure environments without internet access

you:Integrating document extraction capabilities into MCP-enabled AI assistants like Claude Desktop

you:Providing a REST API for web applications to serve styled HTML content from document files

you:What file formats are supported?

you:Can I use this server without internet access?

you:How do I add custom Tika configurations?

When to choose this

Choose this server for local document processing workflows where you need to extract content and metadata without exposing documents to external services.

When NOT to choose this

Avoid if you need cloud-based processing or have already established infrastructure in other languages like Python.

Tools this server exposes

4 tools extracted from the README

extract-to-html
Converts file content to HTML with embedded CSS styling
extract-text
Extracts plain text content from files
list-available-files
Lists files in the extraction directory with details
get-file-metadata
Retrieves detailed metadata from files like title, author, creation date

Comparable tools

file-mcpdocument-extractor-servermcp-server-tika

Installation

**Prerequisites**:

- Java 23+ - Maven 3.6+

**Clone and Setup**:

``bash git clone https://github.com/RayenMalouche/MCP-PDF-Extractor-server.git cd MCP-PDF-Extractor-server mkdir files-to-extract mvn clean install ``

**Configure**:

Edit src/main/resources/application.properties if needed

**Run**:

```bash # HTTP/SSE mode mvn spring-boot:run

# STDIO mode mvn spring-boot:run -- --stdio ```

**Configure Claude Desktop** (for MCP usage):

Add to your claude_desktop_config.json: ``json { "mcpServers": { "tika-extractor": { "command": "java", "args": ["-jar", "path/to/your/target/TikaExtractorMCPServer-1.0.0.jar", "--stdio"] } } } ``

FAQ

What file formats are supported?: The server supports PDF, DOCX, TXT, HTML, images and many other formats through Apache Tika's comprehensive type detection system.
Can I use this server without internet access?: Yes, all operations are local and don't require internet access, making it suitable for secure document processing workflows.
How do I add custom Tika configurations?: You can modify Tika settings in the `application.properties` file or extend the `ConfigLoader` class for more complex customizations.

Compare MCP-PDF-Extractor-server with

MCP-PDF-Extractor-server vs ultimate_mcp_server MCP-PDF-Extractor-server vs mcp-server-chart MCP-PDF-Extractor-server vs everything MCP-PDF-Extractor-server vs filesystem MCP-PDF-Extractor-server vs time

GitHub →

Last updated 2026-05-17 · Auto-generated from public README + GitHub signals.