Plugin Hot-Reload in C++ : Three dlopen Constraints
Hot-reloading C++ plugins in production requires getting three things right that a naive dlopen / dlclose loop ignores. This article documents those three constraints, the design that handles each, and the state machine our plugin host uses to swap plugins safely while the server keeps serving.
The plugin model
Each plugin is a .so shared library that exports a C factory symbol. The host (McpPluginHost) discovers plugins in a configured directory, opens each one with dlopen, looks up the factory, receives an IMcpPlugin*, and registers it against the routing table. Hot-reload performs the inverse : Drain the plugin, dlclose the old handle, dlopen the new one, re-initialise, re-publish the tool list.
The skeleton is a couple of hundred lines. The three constraints below are what separates a working skeleton from a production hot-reload path.
Constraint 1 : Never hold the host lock across user code
Plugins do useful work in initialize() : Opening database connections, probing remote services, warming models. Several of those init paths legitimately need to call back into the host to register dependencies or fetch configuration.
If the host’s registry lock is held during initialize(), every such callback re-enters under the same lock. With multiple plugins probing in parallel, that serialises their startup and creates trivial deadlock paths when init routines reference each other.
The host therefore takes the registry lock only long enough to allocate the slot and store the handle. initialize() runs on a detached thread. Tool registration is a separate, lock-protected step that runs after init returns. Plugins that need slow probing perform it asynchronously and signal readiness when ready.
Rule : a plugin host holds its locks for mechanical bookkeeping only, never across plugin-provided code.
Constraint 2 : Broadcasts must be asynchronous, end to end
The WebSocket layer exposes SocketServer::queueBroadcast. The name implies asynchronous delivery ; In an earlier revision the implementation called syncAwait on the underlying socket write. That was tolerable as long as no caller invoked it from inside a host lock.
Plugin notifications change that. A tool that emits a broadcast on completion runs under the host’s plugin mutex, because tool dispatch is registry-protected. A synchronous broadcast then blocks the registry on a slow WebSocket consumer ; One stalled client back-pressures the entire host.
Two design points guarantee this can’t happen :
queueBroadcastis actually a queue, with a bounded buffer per subscriber and an explicit overflow policy (drop, slow-down, disconnect).- Tool handlers never call broadcast directly. They emit events onto an in-memory event bus, and a dedicated worker drains the bus into the WebSocket layer.
Rule : verify the asynchrony of every function whose name implies it. The implementation is the contract, not the name.
Constraint 3 : dlclose alone is not proof of reload
dlclose is a request, not a command. The library is unloaded only when its reference count drops to zero. A coroutine in flight, a captured std::function, or a tool result queued for delivery is enough to keep the old handle alive. The subsequent dlopen then returns the existing handle and silently serves the old code.
Two mechanisms make a reload functionally verifiable :
- Build identifier. Each plugin embeds a commit SHA plus build timestamp accessible through a known symbol. After reload, the host reads the identifier and compares it to the on-disk artefact. A mismatch is surfaced as an error.
- Explicit drain step. Before
dlclose, the host asks the plugin to quiesce. The plugin waits for outstanding coroutines, cancels timers, flushes queues, and returns when its refcount is provably reduced to the host’s own reference. A drain timeout surfaces stuck reloads explicitly.
Rule : file mtime is not proof of reload. Verify functionally that the new code is the one running.
The reload state machine
Putting the three constraints together :
- Detect new artefact (mtime change or explicit reload command).
- Ask the old plugin to quiesce, with a drain timeout.
- Take the registry write lock ; Mark the slot “reloading”.
dlclosethe old handle.dlopenthe new handle ; Verify the build identifier.- Release the lock ; Schedule
initialize()on a detached thread. - When init returns, briefly retake the lock to publish tools.
- Emit a reload notification via the event bus.
Every step exists because one of the three constraints requires it. Removing any step regresses behaviour under one of the failure modes above. The state machine itself is unremarkable ; What matters is that each transition respects the locking discipline, the asynchrony contract, and the reload verification.
For the broader plugin architecture this hot-reload supports, see Hexagonal Architecture for Plugin Systems. For where plugins fit into the overall infrastructure, see the pillar article on self-hosted MCP infrastructure.
