Show HN: MCP Document Indexer – Local AI search for your documents using Ollama
Summary
The MCP Document Indexer is a Python-based tool for local document indexing and search, utilizing Ollama and LanceDB for efficient document management and semantic search capabilities.
Why It Matters
As remote work and digital documentation proliferate, tools like the MCP Document Indexer empower users to manage their documents locally, ensuring privacy and efficiency. By integrating local LLMs for summarization and keyword extraction, it enhances productivity without relying on cloud services, addressing concerns around data security.
Key Takeaways
- Supports multiple document formats including PDF, Word, and Markdown.
- Integrates local LLMs for real-time document summarization and keyword extraction.
- Utilizes LanceDB for efficient semantic search and indexing.
- Optimized for performance on standard laptops, making it accessible for everyday users.
- Allows for incremental indexing, processing only changed files to save resources.
MCP Document Indexer A Python-based MCP (Model Context Protocol) server for local document indexing and search using LanceDB vector database and local LLMs. Features Real-time Document Monitoring: Automatically indexes new and modified documents in configured folders Multi-format Support: Handles PDF, Word (docx/doc), text, Markdown, and RTF files Local LLM Integration: Uses Ollama for document summarization and keyword extraction. Nothing ever leaves your computer Vector Search: Semantic search using LanceDB and sentence transformers MCP Integration: Exposes search and catalog tools via Model Context Protocol Incremental Indexing: Only processes changed files to save resources Performance Optimized: Designed for decent performance on standard laptops (e.g. M1/M2 MacBook) Installation Prerequisites Python 3.9+ installed uv package manager: curl -LsSf https://astral.sh/uv/install.sh | sh Ollama (for local LLM): # Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Pull a model (e.g., llama3.2) ollama pull llama3.2:3b Install MCP Document Indexer # Clone the repository git clone https://github.com/yairwein/mcp-doc-indexer.git cd mcp-doc-indexer # Install with uv uv sync # Or install as a package uv add mcp-doc-indexer Configuration Configure the indexer using environment variables or a .env file: # Folders to monitor (comma-separated) WATCH_FOLDERS="/Users/me/Documents,/Users/me/Research" # LanceDB storage path LANCEDB_PATH="./vector_index" # Ollama model for summ...