Why We Built Our MCP Server in C++26

Most MCP servers in the wild are written in TypeScript or Python. There’s a reasonable case for that : Those languages get you from idea to a tool Claude can call in an afternoon, and for a small wrapper around a single SaaS API the convenience is real.

Our context is different. We were building an MCP infrastructure intended to host a plugin ecosystem, run local AI inference, serve native desktop and web clients from the same backend, and handle thousands of tool calls per minute under sustained load. C++ was always going to be the language we wrote that in. It is the language we reach for by default for anything that has to be fast, predictable, and long-lived. The interesting decision was not C++ versus the alternatives ; It was C++23 versus going all the way to C++26 with GCC 16. This article explains why we made the second choice.

The non-functional requirements that justify the toolchain

Four properties of the system shaped the toolchain.

Per-call latency budget under 5 ms. An agent loop that issues dozens of tool calls per reasoning step is bottlenecked by the slowest tool, but the median matters too ; It determines whether the loop feels reactive or sluggish. A Python MCP server with even modest tool work routinely spends 10–30 ms per call. C++ tools backed by in-process data stores (DuckDB, RocksDB-style structures, mmap’d files) return in single-digit milliseconds.

Plugin hot-reload without restarting the host. With more than forty plugins and multiple teams iterating on them, server restarts are not free. Anything connected to the server, a native client running an agent loop, a WASM browser session, a long-running RAG ingestion, would be interrupted. The only runtime model that delivers true hot-reload of native code is dlopen/dlclose over shared objects, and that path leads to C or C++.

One backend, four clients (native, web, WASM, CLI). We needed a native desktop client (ImGui), a browser client (Angular), a WASM port of the native client (Emscripten), and a command-line bridge for scripting. They all share substantial business logic. That shared core lives most comfortably in C++, where it can be linked statically into the native binary, compiled to WebAssembly for the browser, or exposed as a JSON-RPC server to anything else.

Memory and GPU control. The same process hosts inference, vector search, training jobs, plugin code, and the WebSocket fleet. Determining who has a GPU buffer, when it can be evicted, when a memory pool is safe to release, that level of control is awkward in a garbage-collected language. RAII is not a slogan ; It’s how the entire VRAM subsystem stays correct under contention.

Why C++26 specifically

Having chosen C++, the obvious default would have been C++23. We went to C++26 (preview-mode features behind GCC 16) for one reason that dominated all others : static reflection, formalised in proposal P2996.

The single most repetitive piece of code in any MCP server is the glue between a tool’s C++ signature and the JSON schema advertised to the LLM. Every argument needs a name, a type, a description, and a validator ; Every response needs a serialiser. In C++23, that ends up either as hand-written JSON-Schema strings (which drift from the C++ signature within weeks) or as a code generator running pre-build (which introduces a separate toolchain). Neither option is acceptable in a codebase with several dozen tools and a small team.

With P2996, we declare the input as a normal struct and let the framework derive the schema at compile time :

struct GetDocumentArgs {
  std::string document_id;   // The unique identifier of the document
  bool include_metadata = false;
};

REGISTER_TOOL(get_document, GetDocumentArgs);

The reflection layer walks the struct’s members at consteval time, extracts the names and types, produces a JSON schema, and emits the deserialiser. The tool author writes one struct ; The server gets a schema, a parser, and a typed handler entry point. There is no drift because there is no second source of truth. We unpack the implementation in C++26 Reflection in Production.

The other C++26 (and C++23) features that earned their keep :

Coroutines for I/O : Every WebSocket handler is a coroutine, every tool that does async work returns one. Stack usage stays low even with hundreds of concurrent connections.
Ranges for the data plane : Query results from DuckDB flow through compose-able pipelines without intermediate vectors, which keeps per-call allocations near zero in the hot path.
std::expected for error handling : Tool implementations return expected<Result, Error> and the framework converts that to a JSON-RPC response without ever throwing across the FFI boundary into the plugin host.

The things people will tell you are downsides

Two objections come up every time we describe the stack. Each is real and each is overstated.

Compiler availability. P2996 is on GCC 16’s preview branch. We install it from /opt/gcc-16 and pin all our CI runners to it. This is a one-line CMake setup. The objection that “enterprise policy doesn’t allow a custom compiler” tends to dissolve the moment someone shows the security team a signed Debian package and a reproducible build pipeline.

Iteration speed. A clean build of the full server plus all plugins is around 90 seconds ; An incremental rebuild of a single plugin is 2–5 seconds. The numbers people remember from C++ are 2014 numbers. Combined with plugin hot-reload, the feel-loop is closer to a scripting language than the raw compile times suggest ; You don’t restart the server to test a change.

So when would we not use C++?

Honestly, rarely, and not for the reasons people assume. If you’re writing a glue script around a single SaaS API, the official TypeScript SDK ships in twenty minutes and we won’t pretend otherwise. But everything we build at the platform level (the server, every plugin, the native client, the bridges) is C++. Not because we have to. Because it’s the language we’d choose if we were starting from scratch tomorrow.

The point isn’t “pick C++26 to be edgy”. The point is to be honest about the non-functional requirements, and to recognise that for a platform of this shape and longevity, C++ is the obvious tool and C++26 is the version that pays for itself fastest through reflection alone.

The follow-up articles in this series cover the protocol choice (MCP vs REST), the plugin architecture (Hexagonal Architecture for Plugin Systems), and the operational realities of hot-reloading native plugins in production (Three dlopen Constraints).