Building Model Context Protocol Servers: A Deep Dive

I’ve spent more than a decade building distributed systems, and the Model Context Protocol is the first thing in a while that’s changed how I think about connecting AI models to the outside world. I’ve built production MCP servers including gopher-mcp and openzim-mcp, and along the way I landed on a handful of patterns worth writing down.

Update (June 2025): I’ve split this guide into two shorter articles:

Gopher MCP Server: Bringing 1991’s Internet to Modern AI - Focuses on the Gopher protocol, its history, and practical applications
OpenZIM MCP Server: Offline Knowledge for AI Assistants - Covers offline Wikipedia access and ZIM format optimization

Understanding the Model Context Protocol Architecture

Before MCP, wiring an AI model up to an external data source meant a one-off integration every time. MCP replaces that with a formal contract between the model and the resource: a standard way to expose data and tools that doesn’t hand the model unchecked access to your system.

It sits between the model and whatever you’re integrating, so old protocols and modern APIs look the same from the model’s side. None of this is new. It’s the same separation-of-concerns thinking distributed systems have relied on for years, pointed at AI tooling.

Strategic Advantages of MCP Implementation

Security: capability-based, with explicit permission boundaries. The model gets exactly the access you grant it, which beats the plugin free-for-all it replaces.
One integration pattern: every resource type is reached the same way, so you’re not relearning a new integration shape for each backend.
Room to grow: adding a capability means implementing an interface, not redesigning the server.
Performance hooks built in: caching, connection pooling, and resource lifecycle management are part of the model rather than bolted on later.

Visualizing the 'Protocol Abstraction Layer' pattern, showing how the system separates high-level requests from low-level resource handling.

Architectural Patterns for Production MCP Systems

Building a couple of MCP servers surfaced the same handful of patterns each time. None of them are specific to MCP; they’re standard distributed-systems practice, applied to the job of exposing resources to a model:

1. Resource-Centric Design

The pattern looks like this: Pydantic models for the data contract, a Protocol for the behavior.

from typing import Protocol

from pydantic import BaseModel


class Resource(BaseModel):
    uri: str
    name: str
    description: str | None = None
    mime_type: str | None = None


class ResourceProvider(Protocol):
    async def list_resources(self) -> list[Resource]: ...
    async def read_resource(self, uri: str) -> bytes: ...

That’s the Strategy pattern: you can swap the backend at runtime without touching the code that uses it. Keeping discovery (list_resources) separate from access (read_resource) is also what makes caching, load balancing, and failover tractable later, since each concern has one place to live.

2. Protocol Abstraction Layer

gopher-mcp has to speak two protocols, Gopher and Gemini, which made it a good place to work out how to abstract over them instead of writing two disconnected handlers. Both clients expose the same async surface and inherit the same caching behavior, and each is registered as a FastMCP tool (condensed from server.py and the client modules):

class GopherClient(TTLCacheMixin[GopherFetchResponse]):
    """Async Gopher protocol client with caching and safety features."""

    async def fetch(self, url: str) -> GopherFetchResponse: ...


class GeminiClient(TTLCacheMixin[GeminiFetchResponse]):
    """Async Gemini protocol client with TLS, caching and safety features."""

    async def fetch(self, url: str) -> GeminiFetchResponse: ...


@mcp.tool(title="Fetch Gopher resource")
async def gopher_fetch(url: str) -> dict[str, Any]:
    client = await client_manager.get_gopher_client()
    response = await client.fetch(url)
    return response.model_dump()

This is the Open/Closed Principle in practice: open for extension, closed for modification. Adding a protocol means implementing the same client interface, not editing the core, so the parts that already work keep working.

3. Async-First Architecture

A server handling concurrent requests wants sub-millisecond responses for cached resources, and a single blocking I/O call can stall everything queued behind it. So every I/O path in gopher-mcp is async end to end, on Python’s asyncio event loop. There’s a nice side effect: because the event loop serializes access to shared state, the cache layer needs no locking at all. Both protocol clients inherit the same TTL + LRU cache behavior from a shared mixin (condensed from cache.py):

import time
from collections import OrderedDict
from typing import Generic, TypeVar

V = TypeVar("V")


class TTLCacheMixin(Generic[V]):
    """Shared TTL + LRU cache behavior for the Gopher and Gemini clients."""

    _cache: "OrderedDict[str, _BaseCacheEntry[V]]"

    def _get_cached_response(self, url: str) -> V | None:
        entry = self._cache.get(url)
        if entry is None:
            return None
        if entry.is_expired(time.time()):
            del self._cache[url]
            return None
        self._cache.move_to_end(url)  # mark as most recently used
        return entry.value

    def _cache_response(self, url: str, response: V) -> None:
        if len(self._cache) >= self.max_cache_entries:
            self._cache.popitem(last=False)  # evict the LRU entry
        self._cache[url] = self._cache_entry_cls(
            key=url,
            value=response,
            timestamp=time.time(),
            ttl=self.cache_ttl_seconds,
        )

Abstract representation of searching within compressed data structures, relevant to the OpenZIM case study.

Case Study: OpenZIM MCP Server Architecture

openzim-mcp searches compressed knowledge bases with millions of articles and needs to return results in under a second. That’s the usual storage-versus-speed tradeoff, with a memory budget tight enough to run on modest hardware.

ZIM File Handling

The hard part sounds like it would be searching compressed data without decompressing the whole thing first. It turns out the ZIM format already solves that: each archive ships with an embedded Xapian full-text index, and libzim exposes it directly. So the server doesn’t build an index at all; it just drives the one that’s already there. The core search path (condensed from openzim-mcp’s zim/search.py):

from libzim.search import Query, Searcher


class _SearchMixin:
    def _perform_search(
        self, archive: Archive, query: str, limit: int, offset: int
    ) -> dict[str, Any]:
        searcher = Searcher(archive)
        search = searcher.search(Query().set_query(query))
        total_results = search.getEstimatedMatches()

        results = []
        for entry_id in search.getResults(offset, limit):
            entry = archive.get_entry_by_path(entry_id)
            results.append({"path": entry_id, "title": entry.title})

        return {"query": query, "total": total_results, "results": results}

Performance Tricks I Discovered

Most of the speed comes from four things:

Lazy loading: articles and indexes are read on demand, so startup stays cheap and memory tracks what you actually touch.
The embedded Xapian index: search runs against the full-text index already inside the ZIM archive, reached through libzim, so there’s no separate index to build or keep in sync.
Memory-mapped I/O: the kernel’s page cache decides what stays in memory, which means no cache layer to write here.
Result caching: paginated search results are cached so a repeated query doesn’t pay the full cost twice.

A visual metaphor for the Gopher MCP server: wrapping 1990s internet protocols in modern server architecture.

Case Study: Gopher MCP Server Implementation

There’s something to learn from old protocols about doing less. Gopher predates most of the complexity we take for granted on the modern web, and that simplicity is exactly what makes it reliable and easy to reason about.

Protocol Implementation

The transport layer (condensed from gopher_transport.py) is about as simple as the protocol. It’s a TCP connection, a selector line, and a bounded read:

import asyncio


async def fetch_gopher(
    host: str,
    port: int,
    selector: str,
    *,
    search: str | None = None,
    max_bytes: int = DEFAULT_MAX_RESPONSE_SIZE,
    timeout: float = DEFAULT_TIMEOUT_SECONDS,
) -> bytes:
    async def _io() -> bytes:
        reader, writer = await asyncio.open_connection(host, port)
        try:
            # "selector[<TAB>search]<CR><LF>", UTF-8 encoded
            writer.write(build_request(selector, search))
            await writer.drain()

            chunks: list[bytes] = []
            total = 0
            while True:
                chunk = await reader.read(min(READ_CHUNK, max_bytes - total + 1))
                if not chunk:
                    break
                total += len(chunk)
                if total > max_bytes:
                    raise GopherProtocolError("Response exceeded size limit")
                chunks.append(chunk)
            return b"".join(chunks)
        finally:
            writer.close()
            await writer.wait_closed()

    # One deadline covers connect, send, and read
    return await asyncio.wait_for(_io(), timeout=timeout)

Content Type Detection

Gopher uses a simple but effective type system. In gopher-mcp, the mapping lives in mime.py as plain data:

type_mappings = {
    "0": "text/plain",                # text file
    "1": "text/gopher-menu",          # menu / directory
    "4": "application/mac-binhex40",  # BinHex file
    "7": "text/gopher-menu",          # search server
    "9": "application/octet-stream",  # binary file
    "g": "image/gif",
    "I": "image/jpeg",
    # ... more mappings
}


def guess_mime_type(gopher_type: str, selector: str = "") -> str:
    """Map a Gopher item type to a MIME type.

    The full implementation also refines ambiguous types using the
    selector's file extension (.html, .png, .pdf, ...).
    """
    return type_mappings.get(gopher_type, "application/octet-stream")

Best Practices for MCP Server Development

1. Error Handling

Handle errors with context. gopher-mcp draws a line between internal exceptions and what crosses the tool boundary:

class GopherProtocolError(Exception):
    """Raised internally when a Gopher request cannot be completed."""


class ErrorResult(BaseModel):
    """Structured error payload returned to the model."""

    kind: Literal["error"] = "error"
    error: dict[str, str]

The split matters: transport failures raise GopherProtocolError inside the client, but the tool boundary converts everything into a structured ErrorResult. An exception that escapes an MCP tool is an opaque failure to the calling model; a structured error is something it can reason about and recover from.

2. Configuration Management

Use structured configuration with validation. gopher-mcp builds its config from pydantic-settings models keyed by environment prefix:

from pydantic_settings import BaseSettings, SettingsConfigDict


class GopherConfig(BaseSettings):
    """Gopher client settings, read from GOPHER_* environment variables."""

    model_config = SettingsConfigDict(env_prefix="GOPHER_", env_file=".env")

    max_response_size: int = 1024 * 1024  # 1MB
    timeout_seconds: float = 30.0
    cache_enabled: bool = True
    cache_ttl_seconds: int = 300
    max_cache_entries: int = 1000
    allowed_hosts: list[str] | None = None  # None = allow all
    allow_local_hosts: bool = False  # SSRF protection

Each surface gets its own settings class (GOPHER_*, GEMINI_*, and GOPHER_MCP_* for the server itself), with field validators that parse comma-separated environment values into typed lists.

3. Testing Strategy

Cover the real paths, integration tests included. With pytest and pytest-asyncio, the shape looks like this:

import pytest


def test_gopher_url_parsing() -> None:
    parsed = parse_gopher_url("gopher://example.com/0/about.txt")
    assert parsed.host == "example.com"
    assert parsed.port == 70
    assert parsed.gopher_type == "0"


@pytest.mark.asyncio
async def test_fetch_returns_menu_result() -> None:
    client = GopherClient(cache_enabled=False)
    result = await client.fetch("gopher://gopher.example/")
    assert result.kind == "menu"

Performance Considerations

Memory Management

Use streaming for large resources
Implement proper caching strategies
Monitor memory usage in production

Concurrency

Design for high concurrency from the start
Use appropriate synchronization primitives
Consider backpressure mechanisms

Network Efficiency

Implement connection pooling
Use compression when appropriate
Handle network timeouts gracefully

Deployment and Monitoring

Docker Deployment

# Build stage: produce a wheel from the source tree
FROM python:3.14-slim AS build
RUN pip install --no-cache-dir uv
WORKDIR /src
COPY . .
RUN uv build --wheel --out-dir /dist

# Runtime stage: install just the wheel, run as a non-root user
FROM python:3.14-slim
RUN useradd --create-home --uid 10001 app
COPY --from=build /dist/*.whl /tmp/
RUN pip install --no-cache-dir /tmp/*.whl && rm -f /tmp/*.whl
USER app

EXPOSE 8000
ENTRYPOINT ["gopher-mcp"]
CMD ["--transport", "streamable-http", "--host", "0.0.0.0", "--port", "8000"]

Health Checks

Health checks look different for MCP servers than for web services. Over the stdio transport there’s no HTTP surface to probe, so it pays to expose health as a first-class tool the client can call like any other. openzim-mcp ships one as zim_health. The pattern looks like this:

@mcp.tool()
async def health_check() -> dict[str, str]:
    """Lightweight liveness probe callable by any MCP client."""
    return {"status": "ok"}

Future Directions in MCP Architecture

MCP is still young, and there’s a lot of room to build on top of it. A few directions I’m watching:

Streaming: backpressure-aware streaming so a large dataset doesn’t have to fit in memory to be processed.
Authentication: capability-based security that holds up across multiple MCP servers, not just one.
Federation: service-mesh patterns for running and load-balancing a fleet of MCP servers.
Observability: distributed tracing and metrics for interactions that span several tools.

Strategic Implications and Future Outlook

Two servers in, what strikes me is how ordinary the good decisions are. Start with the smallest version that works. Add observability early, so you can see what production is actually doing. Then iterate on what it shows you.

MCP is on track to become plumbing that a lot of AI systems depend on. That’s reason enough to build these servers with the same care you’d give any other piece of infrastructure you expect to keep running.

Dive Deeper

Two more focused guides on building specific kinds of MCP server:

Gopher MCP Server: Bringing 1991’s Internet to Modern AI - protocol handlers, Gopher’s history, and where alternative internet protocols still make sense
OpenZIM MCP Server: Offline Knowledge for AI Assistants - building offline knowledge systems, ZIM file handling, and AI assistants that work with no network

The complete implementations are on GitHub: gopher-mcp and openzim-mcp.

Building Model Context Protocol Servers: A Deep Dive

Understanding the Model Context Protocol Architecture

Strategic Advantages of MCP Implementation

Architectural Patterns for Production MCP Systems

1. Resource-Centric Design

2. Protocol Abstraction Layer

3. Async-First Architecture

Case Study: OpenZIM MCP Server Architecture

ZIM File Handling

Performance Tricks I Discovered

Case Study: Gopher MCP Server Implementation

Protocol Implementation

Content Type Detection

Best Practices for MCP Server Development

1. Error Handling

2. Configuration Management

3. Testing Strategy

Performance Considerations

Memory Management

Concurrency

Network Efficiency

Deployment and Monitoring

Docker Deployment

Health Checks

Future Directions in MCP Architecture

Strategic Implications and Future Outlook

Dive Deeper

Keep Reading

Making My Portfolio Agent-Readable: From Files to an Interface Agents Can Act On

ActivityPub MCP Server: Bridging AI and the Fediverse

Building a Gopher MCP Server: Bringing 1991's Internet to Modern AI

Subscribe

Was this helpful?

What others are saying

Mentions

Discuss

Building Model Context Protocol Servers: A Deep Dive

Understanding the Model Context Protocol Architecture

Strategic Advantages of MCP Implementation

Architectural Patterns for Production MCP Systems

1. Resource-Centric Design

2. Protocol Abstraction Layer

3. Async-First Architecture

Case Study: OpenZIM MCP Server Architecture

ZIM File Handling

Performance Tricks I Discovered

Case Study: Gopher MCP Server Implementation

Protocol Implementation

Content Type Detection

Best Practices for MCP Server Development

1. Error Handling

2. Configuration Management

3. Testing Strategy

Performance Considerations

Memory Management

Concurrency

Network Efficiency

Deployment and Monitoring

Docker Deployment

Health Checks

Future Directions in MCP Architecture

Strategic Implications and Future Outlook

Dive Deeper

Keep Reading

Making My Portfolio Agent-Readable: From Files to an Interface Agents Can Act On

ActivityPub MCP Server: Bridging AI and the Fediverse

Building a Gopher MCP Server: Bringing 1991's Internet to Modern AI

Subscribe

Get new posts in your inbox

Was this helpful?

What others are saying

Mentions

Discuss