
Overview
The OpenZIM MCP Server is a comprehensive implementation of the Model Context Protocol that enables AI models to access and search ZIM format knowledge bases completely offline. ZIM files contain compressed archives of websites like Wikipedia, Stack Exchange, Project Gutenberg, and educational resources—making vast amounts of knowledge available to LLMs without internet connectivity.
What is ZIM?
ZIM (Zeno IMproved) is an open file format for storing website content for offline use. Originally created for Wikipedia, it’s now used by Kiwix and other projects to distribute educational content to areas with limited internet access.
Architecture
LLM Client ←→ MCP Protocol ←→ ZIM Server ←→ Local ZIM Files
↓
libzim (C++ Library)
↓
┌─────────┴─────────┐
Full Mode Simple Mode
↓ ↓
15 Specialized Tools 3 Basic Tools
Design Philosophy
- Offline First: No internet required after ZIM files are downloaded
- Dual Mode: Full mode for power users, Simple mode for basic needs
- Smart Retrieval: Automatic fallback and intelligent search
- Production Ready: Comprehensive error handling and logging
Features
🎯 Dual Mode Operation
Full Mode (15 specialized tools):
- Granular control over ZIM file operations
- Advanced search and filtering
- Namespace browsing
- Metadata inspection
- Entry counting and statistics
Simple Mode (3 essential tools):
- Streamlined interface for common tasks
- Automatic ZIM file selection
- Smart search with fallback
- Perfect for basic Wikipedia queries
🔍 Comprehensive Search
- Full-Text Search: Search across entire ZIM archives
- Title Search: Find articles by title
- Namespace Filtering: Search specific content types
- Smart Fallback: Automatically try alternative search strategies
- Result Ranking: Relevance-based result ordering
📚 Content Access
- Article Retrieval: Get full article content
- Metadata Access: Article titles, URLs, MIME types
- Namespace Browsing: Explore content organization
- Entry Listing: Navigate ZIM file structure
- Statistics: Get ZIM file information and counts
🛠️ Available Tools
Full Mode Tools:
list_zim_files
- List all available ZIM filessearch_zim_file
- Full-text search in a specific ZIM fileget_zim_entry
- Get content of a specific entryget_zim_metadata
- Get metadata for an entrylist_zim_entries
- List entries in a ZIM fileget_zim_file_info
- Get information about a ZIM filesearch_by_title
- Search for entries by titleget_entry_by_url
- Get entry by its URLcount_zim_entries
- Count entries in a ZIM filebrowse_namespace
- Browse entries in a specific namespaceget_main_page
- Get the main page of a ZIM filecheck_entry_exists
- Check if an entry existsget_random_entry
- Get a random entryget_illustration
- Get illustration metadatasearch_suggestions
- Get search suggestions
Simple Mode Tools:
search_wikipedia
- Search across all Wikipedia ZIM filesget_article
- Get a specific Wikipedia articlelist_available_zims
- List all available ZIM files
Technical Implementation
Built With Modern Python
- Python 3.8+: Modern async/await support
- libzim: Official ZIM file library (C++ with Python bindings)
- MCP SDK: Model Context Protocol implementation
- Type Safety: Full type hints with mypy
- Error Handling: Comprehensive exception handling
Performance Optimizations
- Lazy Loading: ZIM files loaded on demand
- Connection Pooling: Reuse ZIM file handles
- Smart Caching: Cache frequently accessed content
- Efficient Search: Optimized search algorithms from libzim
Getting Started
Installation
# Install from PyPI
pip install openzim-mcp
# Or install from source
git clone https://github.com/cameronrye/openzim-mcp.git
cd openzim-mcp
pip install -e .
Download ZIM Files
# Create ZIM directory
mkdir -p ~/zim_files
# Download Wikipedia (example - choose your language)
# Visit https://download.kiwix.org/zim/wikipedia/
# Download files to ~/zim_files/
Popular ZIM files:
- Wikipedia: Full or mini versions in any language
- Stack Exchange: Programming Q&A archives
- Project Gutenberg: Classic literature
- TED Talks: Educational videos and transcripts
- Khan Academy: Educational content
Claude Desktop Integration
Add to your Claude Desktop configuration:
{
"mcpServers": {
"openzim": {
"command": "python",
"args": ["-m", "openzim_mcp"],
"env": {
"ZIM_FILES_PATH": "/path/to/your/zim_files",
"ZIM_MODE": "simple"
}
}
}
}
Configuration Options:
ZIM_FILES_PATH
: Directory containing ZIM files (default:~/zim_files
)ZIM_MODE
:simple
orfull
(default:simple
)
Example Interactions
Simple Mode
Once configured, you can ask Claude:
- “Search Wikipedia for information about quantum computing”
- “Get the Wikipedia article about the Roman Empire”
- “What ZIM files are available?”
- “Tell me about machine learning” (automatically searches Wikipedia)
Full Mode
With full mode enabled:
- “Search the English Wikipedia ZIM file for ‘artificial intelligence’”
- “Get the main page of the Wikipedia ZIM file”
- “List all entries in the ‘A’ namespace”
- “Get a random Wikipedia article”
- “Count how many articles are in the ZIM file”
- “Check if an article about ‘Python programming’ exists”
Use Cases
For Education
- Offline Learning: Access Wikipedia without internet
- Research: Search vast knowledge bases locally
- Study Aid: Quick access to reference material
- Curriculum Development: Build educational tools with AI
For Developers
- AI Applications: Build LLM apps with offline knowledge
- Testing: Test AI systems without internet dependency
- Documentation: Access technical documentation offline
- Knowledge Bases: Create custom knowledge retrieval systems
For Researchers
- Data Analysis: Analyze Wikipedia content with AI
- Historical Research: Access archived versions of websites
- Linguistic Studies: Analyze multilingual content
- Information Retrieval: Study search and retrieval algorithms
ZIM File Sources
Official Kiwix Library
Download ZIM files from download.kiwix.org:
- Wikipedia: All languages, full or mini versions
- Wiktionary: Multilingual dictionaries
- Wikivoyage: Travel guides
- Stack Exchange: Programming Q&A
- Project Gutenberg: Classic books
- TED Talks: Educational videos
- Khan Academy: Educational content
- OpenStreetMap: Offline maps
File Sizes
- Wikipedia Mini: 100MB - 500MB per language
- Wikipedia Full: 10GB - 90GB per language
- Stack Exchange: 1GB - 50GB depending on sites
- Project Gutenberg: ~30GB for full collection
Development
Testing
# Run tests
pytest
# Run with coverage
pytest --cov=openzim_mcp
# Test with real ZIM files
pytest --zim-path=/path/to/zim/files
Development Setup
# Clone repository
git clone https://github.com/cameronrye/openzim-mcp.git
cd openzim-mcp
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
Security & Privacy
- Fully Offline: No data sent to external servers
- Local Processing: All searches happen on your machine
- No Tracking: No analytics or telemetry
- Open Source: Auditable code
- Sandboxed: ZIM files are read-only
Performance Considerations
Recommended Hardware
- RAM: 4GB minimum, 8GB+ recommended
- Storage: Depends on ZIM files (100MB - 100GB+)
- CPU: Any modern processor
- Disk: SSD recommended for faster searches
Optimization Tips
- Use Simple Mode for basic queries
- Keep ZIM files on SSD for faster access
- Start with mini Wikipedia versions
- Use specific ZIM files instead of searching all
Community Impact
This project enables:
- Offline AI: LLMs with knowledge in areas without internet
- Educational Access: Bring AI-powered learning to underserved areas
- Privacy: Keep knowledge queries completely private
- Sustainability: Reduce internet bandwidth usage
- Preservation: Access archived knowledge bases
Future Roadmap
- Enhanced caching strategies
- Multi-ZIM search aggregation
- Custom ZIM file creation
- Advanced filtering options
- Performance optimizations
- Web interface for ZIM management
Acknowledgments
- OpenZIM: ZIM file format specification
- Kiwix: ZIM file distribution and tools
- libzim: C++ library for ZIM file access
- Wikipedia: The free encyclopedia
Status: Production Ready
License: MIT
Maintained: Active Development