mcp-gp-pdf-reader

@gpetraroli31

An MCP server to extraxt/search text from pdf

pdf

text-extraction

metadata

MCP PDF Reader Enhanced

A comprehensive Model Context Protocol (MCP) server that provides advanced PDF text extraction, search, and analysis functionality.

Features

Core Functionality

✅ Text Extraction: Extract text content from PDF files with customizable options
✅ Text Search: Search for specific text within PDFs with advanced options
✅ Metadata Extraction: Retrieve comprehensive PDF metadata
✅ Page-specific Processing: Extract content from specific page ranges
✅ Text Cleaning: Normalize and clean extracted text
✅ File Size Limits: Protection against overly large files (50MB limit)
✅ Async Processing: Non-blocking file operations

Advanced Features

🔄 Multiple Tools: 3 specialized tools for different PDF operations
🔍 Smart Search: Case-sensitive, whole-word, and regex search options
📊 Rich Metadata: Extract author, title, creation date, keywords, and more
⚡ Performance: Efficient processing with size limits and error handling
🛡️ Security: File validation and path sanitization

Installation

npm install

Tools Available

1. `read-pdf` - Enhanced PDF Reading

Extract text from PDF files with customizable options.

Parameters:

file (string, required): Path to the PDF file
pages (string, optional): Page range (e.g., '1-5', '1,3,5', 'all'). Default: 'all'
include_metadata (boolean, optional): Include PDF metadata. Default: true
clean_text (boolean, optional): Clean and normalize text. Default: false

Example Usage:

// Basic extraction
{ "file": "/path/to/document.pdf" }

// Extract with clean text and no metadata
{ 
  "file": "/path/to/document.pdf", 
  "clean_text": true, 
  "include_metadata": false 
}

2. `search-pdf` - Search Within PDFs

Search for specific text within PDF documents.

Parameters:

file (string, required): Path to the PDF file
query (string, required): Text to search for
case_sensitive (boolean, optional): Case sensitive search. Default: false
whole_word (boolean, optional): Match whole words only. Default: false

Example Usage:

// Case-insensitive search
{ "file": "/path/to/document.pdf", "query": "important term" }

// Whole word, case-sensitive search
{ 
  "file": "/path/to/document.pdf", 
  "query": "API", 
  "case_sensitive": true, 
  "whole_word": true 
}

3. `pdf-metadata` - Extract Metadata Only

Get comprehensive metadata from PDF files without extracting text.

Parameters:

file (string, required): Path to the PDF file

Returns:

Filename, file size, page count
Author, title, subject, creator, producer
Creation/modification dates, keywords
Encryption status, PDF version

Configuration

Cursor Integration

Add to your Cursor settings:

{
  "mcpServers": {
    "mcp-gp-pdf-reader": {
      "command": "node",
      "args": ["/absolute/path/to/mcp_gp_pdf_reader/index.js"]
    }
  }
}

Future Enhancements

Planned Features

🔮 OCR Support: Extract text from scanned/image-based PDFs
🔮 Image Extraction: Extract images from PDF documents
🔮 Table Detection: Identify and extract tabular data
🔮 Form Data: Extract form fields and values
🔮 Password Support: Handle password-protected PDFs
🔮 Batch Processing: Process multiple PDFs simultaneously
🔮 Caching: Cache parsed results for better performance
🔮 Page-by-Page: True page-specific text extraction

Technical Improvements

🔧 Streaming: Handle very large PDFs with streaming
🔧 Progress Tracking: Progress indicators for long operations
🔧 Resource Management: Better memory usage optimization
🔧 Configuration API: Runtime configuration updates

Usage Examples

Basic Text Extraction

# Via MCP client
"Extract all text from /documents/report.pdf"

Searching PDFs

# Via MCP client  
"Search for 'quarterly results' in /documents/financial-report.pdf"

Getting Metadata

# Via MCP client
"Get metadata from /documents/contract.pdf"

Development

Requirements

Node.js 18.0.0 or higher
Memory: Sufficient for PDF file size + processing overhead
Storage: Temporary space for file operations

Contributing

This MCP server is designed to be extensible. Key areas for contribution:

Additional PDF processing libraries integration
Performance optimizations
New extraction features
Better error handling
Test coverage

License

MIT License

Transport:

stdio

Language:

Created: 6/13/2025

Updated: 5/6/2026

Homepage:

https://github.com/gpetraroli/mcp_pdf_reader

Recommend MCP Servers 💡

me_mcp_server

jhgaylor

An MCP server designed to learn about and interact with a user's personal profile, providing features like job search instructions and access to personal resources (resume, LinkedIn, GitHub, website).

2025-04-16

deepseek-thinker-mcp

ruixingshi

MCP provider that connects Deepseek reasoning content to MCP-enabled AI Clients, supporting Deepseek API and local Ollama server access.

2025-02-13

bounteous-hulk

ravi-bounteous

A Model Context Protocol (MCP) server implementation enabling integration between LLM applications and GitHub/GitLab version control systems

2025-04-10

claude-post

ZilongXue

ClaudePost is an MCP server that enables seamless email management through natural language conversations with Claude, offering secure features like email search, reading, and sending.

2025-01-07

kcve

mattfoster

An MCP server that reads information on Linux Kernel CVEs by querying an SQLite database populated from the Linux kernel CVE list.

2025-04-21

NBA-stats-predictor

dhrbtjr0331

An MCP server that provides real-time NBA player performance forecasts using statistical modeling and real-time data analysis.

2025-04-26

mcp-gp-pdf-reader

MCP PDF Reader Enhanced

Features

Core Functionality

Advanced Features

Installation

Tools Available

1. `read-pdf` - Enhanced PDF Reading

2. `search-pdf` - Search Within PDFs

3. `pdf-metadata` - Extract Metadata Only

Configuration

Cursor Integration

Future Enhancements

Planned Features

Technical Improvements

Usage Examples

Basic Text Extraction

Searching PDFs

Getting Metadata

Development

Requirements

Contributing

License

# `mcpServer` Config

# stdio

Recommend MCP Servers 💡

me_mcp_server

deepseek-thinker-mcp

bounteous-hulk

claude-post

kcve

NBA-stats-predictor

mcp-gp-pdf-reader

MCP PDF Reader Enhanced

Features

Core Functionality

Advanced Features

Installation

Tools Available

1. read-pdf - Enhanced PDF Reading

2. search-pdf - Search Within PDFs

3. pdf-metadata - Extract Metadata Only

Configuration

Cursor Integration

Future Enhancements

Planned Features

Technical Improvements

Usage Examples

Basic Text Extraction

Searching PDFs

Getting Metadata

Development

Requirements

Contributing

License

# mcpServer Config

# stdio

Recommend MCP Servers 💡

me_mcp_server

deepseek-thinker-mcp

bounteous-hulk

claude-post

kcve

NBA-stats-predictor

1. `read-pdf` - Enhanced PDF Reading

2. `search-pdf` - Search Within PDFs

3. `pdf-metadata` - Extract Metadata Only

# `mcpServer` Config