MCP vs REST : Protocol Design for AI Agents
A common first reaction to MCP from REST-shaped engineers is : “why not just expose these tools as HTTP endpoints?” It’s a fair question. REST is the lingua franca of internal APIs ; Every team can call it, every load balancer can route it, every observability stack can trace it. Trading that ecosystem for WebSocket JSON-RPC 2.0 is a choice that needs to justify itself.
This article is the justification, drawn from running an MCP infrastructure in production rather than from spec advocacy.
What MCP actually adds on top of JSON-RPC
Strip MCP of its specifics and you have JSON-RPC 2.0 over a persistent connection. JSON-RPC has been around since 2010 and is boring in the best sense : Well-specified request/response with optional notifications, structured errors, batching, no surprises. What MCP adds on top is a vocabulary : Standard methods for listing tools, calling them, listing resources, reading them, subscribing to changes, exchanging server-initiated prompts. On top of that, a connection lifecycle (initialise → ready → close).
None of that requires WebSocket as the transport. The spec allows stdio (for local Claude-Desktop-style adapters) and SSE (for stateless HTTP). We chose WebSocket because, for an enterprise infrastructure with persistent multi-client agents, it’s the only transport that gives us the four properties below cleanly.
The four properties that decide the choice
1. Stateful sessions
An agent loop is an inherently stateful conversation : It initialises capabilities, authenticates once, may subscribe to resources, accumulates context across tool calls. REST insists on the opposite, each call carries its full context, including auth.
You can paper over statelessness with cookies, session tokens, or server-side session storage, but each of those introduces failure modes (stale sessions, lost cookies, cross-request races) that don’t exist when the connection itself is the session. A WebSocket connection has a clear start, a clear end, and an unambiguous identity for everything that happens in between.
2. Server-initiated notifications
Several MCP scenarios require the server to push :
- A long-running tool completes
- A subscribed resource changes (a database row, a file, an inbox)
- A peer agent emits a proposal that another client needs to render
- A plugin is hot-reloaded and tool schemas change
Each of these is awkward over plain HTTP. You can poll, you can long-poll, you can stream with SSE. None of those are bidirectional, and several of them have ugly failure modes under load balancers and corporate proxies. A WebSocket carries both directions on the same connection, with the same authentication and the same backpressure semantics.
3. Latency, especially under load
REST’s per-call cost (TCP setup, TLS handshake when reused poorly, HTTP/2 stream allocation, application middleware) shows up on every call. With keep-alive and HTTP/2 it’s a few milliseconds ; Without, it’s tens. A WebSocket pays this cost once at the start of the session. After that, every JSON-RPC request/response pair is a few hundred microseconds of framing overhead on top of the underlying handler.
For an agent that issues 30 tool calls during a reasoning step, the difference between 2 ms and 20 ms per call adds up to the difference between a snappy loop and an annoying one.
4. Backpressure and concurrency
A WebSocket connection has a flow-control story that maps naturally to coroutines : Send rate is bounded by the receiver’s drain rate, and the server can reject or queue cleanly. With REST under load you either drop or pile up upstream, neither of which a corporate API gateway handles gracefully when the burst comes from a single client running an aggressive agent.
Summary
For an MCP infrastructure that hosts multiple agents, multiple clients, long-running jobs, subscribable resources, and a hot-reloadable plugin system, the four properties above are not a preference. WebSocket JSON-RPC 2.0 is the only transport that delivers them cleanly, and that is what we run.
The transport choice is one of several architecture decisions that compound. We cover the others in our pillar article on self-hosted MCP infrastructure.
