MCP-PDF-Extractor-server
by RayenMalouche·★ 0·Score 33
Java-based MCP server using Apache Tika to extract content and metadata from PDFs, DOCX and other documents.
Overview
The Tika MCP Extractor Server is a comprehensive Java implementation that provides MCP-compliant tools for document extraction. It supports multiple formats including PDF, DOCX, TXT, HTML and images, converting content to HTML with embedded CSS or plain text. The server exposes four main tools: extract-to-html, extract-text, list-available-files, and get-file-metadata, all while maintaining robust error handling and comprehensive logging. Built with Spring Boot and Jetty, it offers both MCP protocol compliance and REST endpoints for testing and integration.
Try asking AI
After installing, here are 6 things you can ask your AI assistant:
When to choose this
Choose this server for local document processing workflows where you need to extract content and metadata without exposing documents to external services.
When NOT to choose this
Avoid if you need cloud-based processing or have already established infrastructure in other languages like Python.
Tools this server exposes
4 tools extracted from the READMEextract-to-htmlConverts file content to HTML with embedded CSS styling
extract-textExtracts plain text content from files
list-available-filesLists files in the extraction directory with details
get-file-metadataRetrieves detailed metadata from files like title, author, creation date
Comparable tools
Installation
Installation
- **Prerequisites**:
- Java 23+ - Maven 3.6+
- **Clone and Setup**:
``bash git clone https://github.com/RayenMalouche/MCP-PDF-Extractor-server.git cd MCP-PDF-Extractor-server mkdir files-to-extract mvn clean install ``
- **Configure**:
Edit src/main/resources/application.properties if needed
- **Run**:
```bash # HTTP/SSE mode mvn spring-boot:run
# STDIO mode mvn spring-boot:run -- --stdio ```
- **Configure Claude Desktop** (for MCP usage):
Add to your claude_desktop_config.json: ``json { "mcpServers": { "tika-extractor": { "command": "java", "args": ["-jar", "path/to/your/target/TikaExtractorMCPServer-1.0.0.jar", "--stdio"] } } } ``
FAQ
- What file formats are supported?
- The server supports PDF, DOCX, TXT, HTML, images and many other formats through Apache Tika's comprehensive type detection system.
- Can I use this server without internet access?
- Yes, all operations are local and don't require internet access, making it suitable for secure document processing workflows.
- How do I add custom Tika configurations?
- You can modify Tika settings in the `application.properties` file or extend the `ConfigLoader` class for more complex customizations.
Compare MCP-PDF-Extractor-server with
Last updated · Auto-generated from public README + GitHub signals.