MCP Catalogs
Homepdf-mcp screenshot

pdf-mcp

by jztan·35·Score 47

MCP server enabling AI agents to read, search, and extract content from large PDFs with chunked reading, hybrid search, OCR, and SQLite caching.

ai-llmfile-systemdeveloper-tools
5
Forks
0
Open issues
this month
Last commit
2d ago
Indexed

Overview

pdf-mcp is a specialized MCP server built with Python and PyMuPDF that provides AI agents with efficient access to PDF content. It solves the context window limitation problem by allowing agents to read specific pages or ranges rather than loading entire documents. The server implements hybrid search combining BM25 keyword and semantic search via Reciprocal Rank Fusion, OCR capabilities for scanned documents, structured extraction of tables and images, and persistent SQLite-based caching. It features robust security with HTTPS-only URL fetching and SSRF protection.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:Summarizing large PDF documents like annual reports or research papers without context overflow
you:Searching and extracting specific information from technical documentation or legal contracts
you:Processing scanned documents with OCR capabilities for full-text search and content extraction
you:How does pdf-mcp handle large PDFs?
you:What search capabilities are available?

When to choose this

Choose pdf-mcp when you need comprehensive PDF processing capabilities for AI agents, especially for large documents requiring chunked reading, hybrid search, and OCR support.

When NOT to choose this

Don't choose pdf-mcp if you need encrypted PDF support, real-time collaborative PDF editing, or advanced multimedia processing beyond images.

Tools this server exposes

8 tools extracted from the README
  • pdf_info

    Page count, metadata, TOC summary, scanned-page detection. Call first.

  • pdf_get_toc

    Full table of contents for documents with >50 bookmarks

  • pdf_read_pages

    Read specific pages or ranges; OCR-on-demand; embedded images + tables

  • pdf_read_all

    Read entire document in one call (byte-capped for safety)

  • pdf_render_pages

    Render pages as PNG for vision models — diagrams, handwriting, scans

  • pdf_search

    Hybrid RRF search (keyword + semantic), page or section granularity

  • pdf_cache_stats

    Per-document cache breakdown + total size

  • pdf_cache_clear

    Clear expired or all cache entries

Comparable tools

file-system-mcpdocument-extraction-mcppdf2txtpymupdf

Installation

pip install pdf-mcp

For Claude Desktop, add to claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-mcp": {
      "command": "pdf-mcp"
    }
  }
}

FAQ

How does pdf-mcp handle large PDFs?
pdf-mcp uses chunked reading, allowing AI agents to read specific pages or page ranges rather than loading entire documents, preventing context overflow issues.
What search capabilities are available?
pdf-mcp provides hybrid search using Reciprocal Rank Fusion to combine BM25 keyword search and semantic search, enabling more comprehensive document queries.

Compare pdf-mcp with

GitHub →

Last updated · Auto-generated from public README + GitHub signals.