pdf-mcp

Name: pdf-mcp
Rating: 2.4 (35 reviews)
Author: jztan

by jztan·★ 35·Score 47

MCP server enabling AI agents to read, search, and extract content from large PDFs with chunked reading, hybrid search, OCR, and SQLite caching.

ai-llmfile-systemdeveloper-tools

Forks

Open issues

2 mo ago

Last commit

56d ago

Indexed

Overview

pdf-mcp is a specialized MCP server built with Python and PyMuPDF that provides AI agents with efficient access to PDF content. It solves the context window limitation problem by allowing agents to read specific pages or ranges rather than loading entire documents. The server implements hybrid search combining BM25 keyword and semantic search via Reciprocal Rank Fusion, OCR capabilities for scanned documents, structured extraction of tables and images, and persistent SQLite-based caching. It features robust security with HTTPS-only URL fetching and SSRF protection.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:Summarizing large PDF documents like annual reports or research papers without context overflow

you:Searching and extracting specific information from technical documentation or legal contracts

you:Processing scanned documents with OCR capabilities for full-text search and content extraction

you:How does pdf-mcp handle large PDFs?

you:What search capabilities are available?

When to choose this

Choose pdf-mcp when you need comprehensive PDF processing capabilities for AI agents, especially for large documents requiring chunked reading, hybrid search, and OCR support.

When NOT to choose this

Don't choose pdf-mcp if you need encrypted PDF support, real-time collaborative PDF editing, or advanced multimedia processing beyond images.

Tools this server exposes

8 tools extracted from the README

pdf_info
Page count, metadata, TOC summary, scanned-page detection. Call first.
pdf_get_toc
Full table of contents for documents with >50 bookmarks
pdf_read_pages
Read specific pages or ranges; OCR-on-demand; embedded images + tables
pdf_read_all
Read entire document in one call (byte-capped for safety)
pdf_render_pages
Render pages as PNG for vision models — diagrams, handwriting, scans
pdf_search
Hybrid RRF search (keyword + semantic), page or section granularity
pdf_cache_stats
Per-document cache breakdown + total size
pdf_cache_clear
Clear expired or all cache entries

Comparable tools

file-system-mcpdocument-extraction-mcppdf2txtpymupdf

Installation

pip install pdf-mcp

For Claude Desktop, add to claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-mcp": {
      "command": "pdf-mcp"
    }
  }
}

FAQ

How does pdf-mcp handle large PDFs?: pdf-mcp uses chunked reading, allowing AI agents to read specific pages or page ranges rather than loading entire documents, preventing context overflow issues.
What search capabilities are available?: pdf-mcp provides hybrid search using Reciprocal Rank Fusion to combine BM25 keyword search and semantic search, enabling more comprehensive document queries.

Compare pdf-mcp with

pdf-mcp vs ultimate_mcp_server pdf-mcp vs mcp-server-chart pdf-mcp vs everything pdf-mcp vs filesystem pdf-mcp vs time

GitHub →

Last updated 2026-05-17 · Auto-generated from public README + GitHub signals.