Quick path
In this article
Quick read: what changed, why it matters, and what to do next.
A customer asks for a refund. They type one sentence, hit send, and ninety seconds later a paragraph comes back: approved, here is your confirmation number, the credit lands in three to five business days. One question in, one answer out. As far as the customer can tell, they talked to a single thing.
Underneath, that request made five stops.
A triage agent read the message and decided it was a billing problem, not a login problem. It handed the conversation to a policy agent, which checked whether this account, on this plan, past this date, even qualifies for a refund. The policy agent called a billing tool to pull the actual charge. A refund agent issued the credit. A notification writer turned all of that into the friendly paragraph the customer read. Five components, one visible reply, and a lot of traffic in between that nobody was watching.
Now play it forward to the bad day. The refund goes out twice. Or it stalls at the billing tool and the customer gets silence. Or the policy agent approved something it should have declined. Where do you look? You have one log line that says the conversation happened and one paragraph that says it went fine. The five stops in the middle left no map.
Most teams instrument the wrong layer for this. They tune the model, sharpen the instruction, add another tool, and treat reliability as a smarts problem. But the refund did not fail because an agent was not clever enough. It failed somewhere in the wiring between agents, and the wiring is the part nobody drew.
So here is the unfashionable claim: your agent stack does not need another brain. It needs a switchboard.
The traffic you are not watching
There is a useful distinction borrowed from networking. North-south traffic is the request coming in and the answer going out, the part your users see. East-west traffic is everything moving sideways between your own components to produce that answer.
A single agent has almost no east-west traffic. A multi-agent workflow is mostly east-west traffic. The refund example had one north-south exchange and at least four sideways hops, plus a couple of tool calls, plus a streamed token feed back to the UI. The interesting failures all live in that sideways layer, and it is exactly the layer most teams have no instrument for.
This is the seam worth noticing right now, because teams are crossing it without realizing it. You add a second agent to "handle the billing cases," and you have quietly gone from a program with one actor to a small distributed system with several. The hard questions stop being about model quality and start being about plumbing: which agent called which, what context crossed the boundary, did the stream finish or hang, what happens on a timeout, and who can prove what the system actually did.
What OpenAgentIO actually is (and what it is not)
A project that surfaced this month points straight at that layer. OpenAgentIO describes itself as a "Conversation-Aware Runtime Bus for Distributed AI Agents," an Apache-licensed, lightweight runtime bus and bridge layer for agent systems built out of mismatched parts.
The honest caveat first, because it changes how you should read everything below. This is early. The repository is tagged v0.3-alpha, the bridge layer and multi-language SDKs are explicitly developer preview, and the star count is small. This is not a thing to bet a production support queue on next week. It is a thing to read as a clear articulation of a problem you already have, whether or not you ever install it.
What it does is take the messy ways agents already talk to each other and map them into a few named flows. MCP servers, Matrix rooms, and HTTP and server-sent-event streaming endpoints get expressed as structured Invoke, StreamInvoke, and Publish/Subscribe operations. It carries session identity and OpenTelemetry tracing context across process and language boundaries, so a request that starts in a Python triage agent and ends in a Go billing service can still be stitched into one trace.
The line in the README that matters most is the disclaimer. OpenAgentIO says it is not another agent framework. It is the communication and observability substrate underneath frameworks like LangGraph or CrewAI. It is also not trying to be A2A, the emerging cross-vendor interoperability protocol; it is aimed lower, at the runtime that actually carries the traffic inside one system. The examples repository makes the focus concrete without a single clever model instruction in sight: request and reply, pub/sub, streaming, parallel execution, agent handoff, async tasks, distributed tracing. The demos run in-memory for a single process and switch to NATS when the agents live in separate processes or on separate machines. The whole thing is about communication patterns, not about making any one agent smarter.
You do not have to adopt it to take the point. The point is that this layer is a thing now, with a name and a shape, and most teams are running it by accident.
Shelf, switchboard, receipts
Three layers, three jobs. They get conflated constantly, and conflating them is why debugging multi-agent work means reading five logs that do not know they are about the same request.
The capability shelf is what your agents can reach: the tools, MCP servers, and skills available to be called. We have written about keeping that shelf honest in agentic resource discovery and the capability shelf. A clean shelf tells you what is possible. It tells you nothing about what happened.
The runtime switchboard is the layer this post is about: the live routing of one request across the shelf. It is the operator board that knows the triage agent called the policy agent, which called the billing tool, on this session, with this trace, under this timeout. The shelf is the directory. The switchboard is the call in progress.
The receipt is what the switchboard leaves behind: a durable, human-readable record that a given step ran, with what inputs, to what effect. We made the case for those in agent receipts that log customer work. Receipts are how a human, hours later, reconstructs the call without replaying the whole system.
Frameworks sell you brains and sit on the shelf. The switchboard and the receipts are usually yours to build, and the first things to go missing.

The artifact: an agent dispatch ledger
You do not need OpenAgentIO to start watching east-west traffic. You need to write one request path down before you add the next agent. The artifact for that is a dispatch ledger: one page, one workflow, one row per hop, recording where a single request actually went.
Here is the refund example, traced by hand:
On phones, each hop is shown as a readable ledger card; on wider screens, scroll sideways if needed.
| User turn | Session ID | Calling agent | Target agent/tool | Pattern | Transport | Context carried | Trace ID | Timeout / retry | Human-visible receipt |
|---|---|---|---|---|---|---|---|---|---|
| "I want a refund for last month" | sess-4f1a | (inbound) | triage-agent | Invoke | HTTP | raw message, account ID | trace-9c2 | 10s, no retry | "Routing your request" |
| same | sess-4f1a | triage-agent | policy-agent | Invoke (handoff) | NATS | account ID, plan, charge date | trace-9c2 | 8s, 1 retry | none (internal) |
| same | sess-4f1a | policy-agent | billing-tool | Invoke | NATS | charge ID | trace-9c2 | 5s, 2 retries, dead-letter | none (internal) |
| same | sess-4f1a | policy-agent | refund-agent | Invoke | NATS | approval, amount, charge ID | trace-9c2 | 8s, no retry (not idempotent) | "Refund approved: $42.00" |
| same | sess-4f1a | refund-agent | notify-writer | StreamInvoke | HTTP/SSE | outcome, confirmation no. | trace-9c2 | 15s stream, close-on-done | the paragraph the customer reads |
Every column earns its place by answering a question you will eventually be asked at the worst possible moment.
Session ID is what lets you pull all five rows back together as one customer interaction instead of five unrelated events. Without it, your logs know that a refund happened and that a policy check happened, but not that they were the same conversation. This is the same problem the session manifest solves at the handoff boundary: the ledger watches the request, the manifest defines what is allowed to travel with it.
Trace ID is the thread that survives crossing a process or a language. When the refund goes out twice, the trace is what shows you the refund-agent got invoked twice on one trace, which is a very different bug from two customers hitting the same path. This is the entire reason trace context exists, and the reason a runtime bus bothers to propagate it for you.
Context carried is the column people skip and regret. It records what actually crossed each boundary. If the policy agent approved something it should not have, this column tells you whether it even received the charge date it was supposed to check against. Half of "the agent made a bad decision" turns out to be "the agent never got the input."
Pattern and transport tell you the shape of each hop. An Invoke that blocks waiting for a reply fails differently than a StreamInvoke that has to be closed, which fails differently again than a Publish that fires and forgets. The streamed final step is the one that hangs your UI if nobody owns closing the stream. Writing the pattern down forces you to ask who ends it.
Timeout and retry is the column that quietly decides whether a bad day is a blip or a duplicate-refund incident. Notice the refund-agent row: no retry, because issuing a refund is not idempotent and retrying it is how you pay a customer twice. The billing-tool row, a read, gets two retries and a dead-letter path because a dropped read is safe to repeat and worth catching when it keeps failing. If you have not decided this per hop, the framework decided it for you, and it does not know which of your calls are safe to repeat.
Human-visible receipt is the last column because it is the test. For every internal hop, ask: if this step did the wrong thing, what could a human see afterward to know? Three of these rows say "none (internal)." That is not automatically wrong, but each blank is a place where the system can act and leave no evidence, and you should choose those blanks on purpose rather than discover them during an incident.
Where this goes wrong without a ledger
The failure modes are not exotic. They are the ordinary consequences of running a distributed system you never admitted was one.
A stream that never closes leaves a spinner running and a connection held open, and because the north-south answer never arrived, your logs may not even record a failure. A retry on a non-idempotent call pays the refund twice and looks, in the logs, like the customer asked twice. A dropped session ID scatters one interaction across five unlinkable events, so the post-incident question "what did we do for this customer?" has no single answer. A context value that silently failed to cross a boundary turns into "the agent hallucinated," when really it was reasoning over a blank.
None of these are model problems. You cannot wordsmith your way out of any of them. They are switchboard problems, and a switchboard problem is invisible until you have drawn the board.
Start with one request path
The move is not to adopt a runtime bus this quarter. OpenAgentIO is worth watching precisely because it names the layer cleanly, but at v0.3-alpha the smart use of it is as a vocabulary, not a dependency.
The move is smaller and you can do it today. Take one workflow that already fans out, the one with a triage step and a tool call and a generated reply. Trace a single real request through it by hand and fill one row of the ledger per hop. You will almost certainly hit a stall at one of these cells: a retry rule you never set, a context value you assumed crossed and cannot prove, or an internal step with no receipt and real-world consequences. That stall is the finding. It is the thing that was going to fail at 3am, surfaced at a desk instead.
Do that before you add the second agent, not after. The cost of mapping one path is an afternoon. The cost of not mapping it is a duplicate refund and a log that cannot tell you why.
If you want help with the first trace, bring one workflow that already has a couple of hops and we will map an agent dispatch ledger with you, or scope the request path as part of process automation. The companion decision, what context and retries a fanned-out request is even allowed to carry, is the work in AI workflow controls.
One drawn request path beats a green dashboard that never watched the traffic.
Implementation help
Map the traffic before you add the next agent
Bring one workflow that already fans out across a triage step, a tool call, and a reply. BaristaLabs will help you trace a single request through every stop and fill the dispatch ledger by hand.
Best fit for teams already running a multi-step agent workflow who cannot yet answer "where did that request go?"
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Turn this idea into a pilot
Which workflow should go first?
Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.
- 3-5 minutes
- Deterministic score
- No sensitive data
Share this post
