The procurement team notices the first strange thing in a vendor review, not in an AI dashboard.
A sourcing agent has been asked to compare three logistics vendors before a contract renewal. The request sounds harmless. Pull delivery history, check insurance certificates, compare open incidents, ask finance whether the spend fits the quarter, then prepare a renewal recommendation. The sourcing agent calls the vendor-risk agent. The risk agent asks a compliance agent for sanctions context. The finance agent streams a margin note back into the same workspace. By lunch, four agents have touched the decision and one of them has seen a document it was supposed to know existed, not read.
The postmortem question is painfully simple: which of those calls were supposed to happen?
Nobody reaches for a prompt transcript first. They reach for the diagram, and the diagram does not exist. The agents were wired one at a time, each for a reasonable task, each with a credential that worked. No one had written down which agent may discover which other agent, which one may call it, which operations are allowed, and who can close the path when the graph surprises everyone.
That is where multi-agent work stops being a model problem and becomes an access-graph problem.
The math nobody budgets for
AWS put a number on that seam in a July 1 blog post on building a serverless A2A gateway: "without a central orchestrator, a deployment of 20 agents requires up to 190 point-to-point connections." The post describes the underlying operational burden as "point-to-point connections, separate credentials, and custom routing logic," with access control fragmented across those separate paths.
Twenty agents, up to 190 pairings. That is the full-mesh ceiling, not a roadmap anyone chooses on purpose. The graph accretes by exception: one agent needs a vendor score today, another needs a payment status tomorrow, a third needs an incident summary next week. Each connection feels local. The risk is cumulative.
The procurement incident above does not need 190 connections to go wrong. It needs four, and one of the four was never supposed to exist.
What AWS actually shipped
The pattern AWS published, alongside a reference implementation on GitHub, treats that graph as the thing to manage. It hosts multiple agents behind one domain using path-based routing (/agents/{agentId}), while AWS says standard Agent2Agent clients work without modification. In other words, the gateway sits in front of the protocol, not instead of it.
Three layers do the work. A management layer holds a centralized agent registry with cached agent cards and semantic search, so a caller can discover an agent by capability rather than exact name. A control layer checks JSON Web Token scopes against a permissions table through a Lambda authorizer. If the caller lacks permission, the denial happens at API Gateway before the backend agent ever sees the request. An execution layer routes the call, handles backend OAuth authentication, and supports Server-Sent Events streaming for agents that respond incrementally.
The concrete pieces matter because they name the operating questions underneath any multi-agent stack. AWS describes an Agent Registry that maps agent IDs to backend URLs, authentication configuration, and cached agent cards. A Permissions table maps JWT scopes to allowed agents. A RateLimitCounters table tracks requests per minute. Native A2A discovery still exists through endpoints such as GET /agents/{agentId}/.well-known/agent-card.json; gateway-level management adds GET /agents for the agents a caller may list and POST /search for semantic discovery.
The sample README is candid about why this exists at all. AgentCore Runtime, AWS notes, "can host A2A servers with OAuth 2.0 authentication," but on its own "it does not provide the management, control, and data layers needed to operate multiple A2A agents behind a single domain." That is the useful distinction. Hosting an agent is not the same as governing the paths between agents.
The door that adds doors is guarded too: registration and management require a gateway:admin scope.
The Agent Access Matrix
You do not need AWS's stack to borrow the discipline it forces. Before a gateway can enforce anything, somebody has to decide what the graph should look like. That decision belongs in a small artifact: an agent access matrix.

Here is the shape we would put in front of a team before their third agent starts calling the first.
The Agent Access Matrix
Before the next agent joins the call graph
Agent access matrix
Fill this out before one agent is allowed to discover, call, stream to, or authenticate with another agent.
- 01
Agent ID
Pins down: The stable name used in routing, logs, policies, and incident review.
Why it matters:A vague nickname cannot anchor access control.
- 02
Business owner
Pins down: The person or team accountable for the agent's behavior.
Why it matters:Every callable agent needs an owner before it becomes a dependency.
- 03
Runtime and endpoint
Pins down: Where the agent runs and the path clients call.
Why it matters:Routing should be visible before it becomes infrastructure.
- 04
Capabilities
Pins down: What the agent card says the agent can do.
Why it matters:Discovery metadata should not silently become permission.
- 05
Allowed callers
Pins down: Which agents, clients, or workflows may reach this agent.
Why it matters:This is where a wild-card access graph turns into a governed one.
- 06
Required scopes
Pins down: The JWT scopes or policy claims needed for access.
Why it matters:Authentication is not enough; the caller needs the right authorization.
- 07
Allowed operations
Pins down: Read, write, search, escalate, stream, administer, or a narrower verb list.
Why it matters:Calling an agent and asking it to change something are separate risks.
- 08
Streaming allowed
Pins down: Whether partial responses may flow back before completion, and to whom.
Why it matters:A live stream can leak reasoning or sensitive context before review.
- 09
Rate limit
Pins down: Per-caller or per-user ceilings for this agent path.
Why it matters:A retry storm should not become a production incident.
- 10
Credential source
Pins down: Where backend authentication secrets or tokens are stored and rotated.
Why it matters:Point-to-point credentials are where invisible coupling starts.
- 11
Log destination
Pins down: Where calls, denials, streams, and admin changes are recorded.
Why it matters:The matrix should be auditable after the fact.
- 12
Emergency disable owner
Pins down: Who can shut down the path without a meeting.
Why it matters:Incident response starts with knowing who can close the gate.
- 13
Review date
Pins down: When the row must be re-approved or retired.
Why it matters:Agent permissions age like any other production access.
If the caller list says 'any authenticated agent,' the matrix is not done.
The field that usually creates the argument is Allowed callers. Good. That is the argument to have before production, not after an incident. "Any authenticated agent" sounds efficient until a compliance agent becomes callable by a workflow that only needed a vendor score.
Streaming deserves its own decision. Server-Sent Events make agent responses feel fast and alive, but a stream is still a live pipe. If a caller should not see a compliance agent's reasoning, it should not receive that reasoning token by token while everyone waits for the final answer.
Emergency disable owner is the field everyone wants to defer because it sounds pessimistic. It is not pessimism. It is incident response. When a path starts doing something it should not, the right person should know they can close it without convening a design review.
Discovery is not permission
The access matrix also keeps three neighboring ideas from blurring together.
A capability shelf helps an agent find the right resource at runtime instead of loading every possible tool or agent into context. An agent dispatch ledger records where one request actually traveled. Tool-call governance decides whether an agent may touch a system once it reaches for a tool. The access matrix sits before all three: it says which agents are findable, which are callable, and under which scopes a path should exist at all.
That distinction matters because discovery is not permission. Returning an agent card should not automatically authorize a call. Seeing that a finance agent exists should not mean every other agent can ask it for margin context. A gateway can help separate those decisions, but the separation has to be designed before it can be enforced.
What to do before the third agent
Do not start by standing up the gateway. Start by drawing the graph you would be willing to defend in a postmortem.
Pick one workflow where two agents already hand off work, or where a second agent is about to call a first. Write one matrix row per agent. Name the owner. Name the caller list. Name the scopes. Decide whether streaming is allowed. Set a rate limit. Put the credential source and log destination in the same row as the business decision. Add the disable owner and the review date while the system is still small enough to change.
Then decide whether AWS's sample, another A2A gateway, or your own control layer should enforce it.
The sequence matters. Infrastructure makes the graph real. The matrix decides whether the graph is worth making real.
Bring us one multi-agent workflow, even an early one. BaristaLabs can help map which agents may discover, call, stream to, and authenticate with which others, as part of process automation grounded in AI workflow controls. The matrix is worth finishing before the gateway is worth building.
Multi-agent access review
Draw the matrix before the call graph draws itself
Bring one workflow where two or more agents already hand off work. BaristaLabs will help map caller lists, scopes, streaming rules, rate limits, credentials, logs, and disable owners before the graph reaches production.
Best fit for teams moving beyond one agent and preparing for agent-to-agent calls, handoffs, or shared registries.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Turn this idea into a pilot
Which workflow should go first?
Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.
- 3-5 minutes
- Deterministic score
- No sensitive data
Share this post
