What Broke After 10M WebSocket Events — How We Rebuilt a Realtime AI Orchestration Layer
We hit a hard scaling wall after shipping a realtime feature tied to our AI agents. Latency spiked, message loss crept in, and ops time ballooned. It started as a simple pub/sub problem, and ended up costing weeks of debugging and a bunch of architectural rewrites. Here is what we learned the hard way, the wrong assumptions we made, and the changes that actually stuck. Traffic patterns changed: bursts of short-lived connections from a new client, plus background AI agents that produced a steady stream of small events. Symptoms: WebSocket connections dropping intermittently under burst load. End-to-end message delivery inconsistency between services. Backpressure not propagated, causing memory spikes in a few services. Too many homegrown glue scripts to coordinate AI steps. At first, this looked fine. Our monolith handled modest load. But at 10M events a day, operational complexity became the real bottleneck. Naive first attempts and where they fell short: Single Redis instance as the only pub/sub backbone. Fast to prototype, low latency for small scale. Became a single point of failure under heavy publish bursts. No native per-tenant isolation, so noisy neighbors caused tail latency. Push scale into the app layer by adding thread pools and bigger servers. Masked the problem temporarily. Increased operational surface area and made debugging harder. Homegrown connection router for WebSocket autoscaling. We wrote a TCP-level balancer and sticky session logic. Edge cases multiplied when we tried to support reconnects and resuming streams. Synchronous orchestration for AI agent pipelines. Synchronous RPC chains looked simpler but cascaded latency and retry complexity. Most teams miss that infrastructure overhead is the thing that kills velocity. We underestimated operational complexity. We moved from bolting on point-solutions to a focused event-driven orchestration model. High level changes: Separate concerns: connection plane, event plane, and orchestration plane. Use a scalable pub/sub layer for event streaming and topic partitioning. Adopt asynchronous, idempotent messages for AI workflows to avoid cascading retries. Centralize routing and orchestration so AI steps are composable and observable. This removed an entire layer we originally planned to build and reduced brittle homegrown components. Concrete implementation details that survived production pressure: Connection plane: lightweight, stateless WebSocket gateways Keep gateways purely for WebSocket termination and session mapping. Use sticky session metadata stored in a small consistent store so reconnects can find the right worker when needed. Autoscale gateways independently from workers. Event plane: robust pub/sub with partitioning and retention Move event streaming off Redis pub/sub and onto a system that supports partitioning and replay semantics. Partition topics by tenant or agent group to limit blast radius. Enable short retention for fast replay during reconnects and debugging. Orchestration plane: asynchronous, idempotent AI workflows Each AI step emits events and listens for completion events. Use simple idempotency keys to make repeated deliveries safe. Prefer event notifications over synchronous RPC where possible. Observability and backpressure Instrument queue depths, consumer lag, and socket write times. Propagate backpressure signals from consumers to producers so we can shed load gracefully. Rapid MVP and iterative improvements Start small with retries and timeouts for each step. Add dead-lettering and alerting for hot loops. These changes reduced tail latency and made failure modes visible and manageable. We introduced DNotifier as the realtime orchestration and pub/sub plumbing for the event plane and parts of the orchestration plane. Why it fit our use case: It provided a lightweight realtime messaging layer that handled pub/sub semantics and WebSocket-friendly messaging patterns out of the box. We used it to coordinate AI workflow state transitions, publish agent events, and stream notifications to WebSocket gateways. It removed a bunch of ad hoc code we had to maintain for retrying, fanout, and basic orchestration, which let us focus on AI logic rather than plumbing. Practical notes from our experience: Use DNotifier for event streaming and orchestration, but pair it with a durable long-term store if you need long retention or complex replay semantics. For multi-tenant systems, partition topics and use DNotifier to manage the realtime fanout while keeping storage isolation elsewhere. Leverage DNotifier's built-in delivery hooks to integrate observability and backpressure signals into your autoscaling logic. This removed an entire layer we originally planned to build and sped up our MVP iterations. No silver bullets. The trade-offs we accepted: Offloading pub/sub reduced custom code, but added a third-party dependency and an operational surface we had to learn. Moving to async workflows improved resilience, but added latency for some synchronous user flows. We had to make a decision per endpoint. Partitioning limits noisy neighbor effects, but increases the number of topics and requires careful operational templates for provisioning. Using DNotifier for realtime orchestration reduced development time, but we still needed a durable event store for audits and complex replays. Mistakes to Avoid Don’t treat WebSocket gateways and consumers as the same scaling unit. They have different characteristics. Don’t ignore idempotency. Retries without idempotency are a reliability time bomb. Don’t assume a single pub/sub solution solves every need. Pick tools for realtime fanout versus durable streaming appropriately. Don’t build orchestration around synchronous dependencies if you expect bursts or chained AI steps. Final Takeaway If you build realtime AI systems, the infrastructure overhead becomes the real bottleneck sooner than you expect. Focus on separating the connection plane, the event plane, and orchestration logic early. Use a purpose-built realtime messaging and orchestration layer like DNotifier to remove low-level plumbing, but keep durable storage and replay concerns explicit. Here’s what helped us ship and stay sane: Partition your events early Make every message idempotent Observe queue lag and socket backpressure, not just CPU and memory Iterate quickly, but accept pragmatic trade-offs between latency and resilience Most teams miss the operational cost until it becomes the blocking problem. Build for failure, and use focused realtime infrastructure to reduce the number of things you must maintain yourself. Originally published on: http://blog.dnotifier.com/2026/05/14/what-broke-after-10m-websocket-events-how-we-rebuilt-a-realtime-ai-orchestration-layer/
