OpenAI announced on March 16, 2026 that subagents are now available in Codex for all developers, across both the app and the CLI. The headline feature is obvious: you can spin out side agents, keep the main thread cleaner, and run pieces of a task in parallel. The less obvious part is the one that matters: the economic unit of a coding task just changed.
Before subagents, one request mostly meant one overloaded context window. You either kept piling more requirements into the same thread, or you opened a second session and managed the handoff yourself. With subagents, one task can fan out into several bounded work streams: one agent audits the auth flow, one maps the failing test surface, one traces the database call chain, and the main agent stays focused on deciding what to do with those findings.
That is not "AI got faster." It is closer to "AI work got delegatable."
The bottleneck moved from tokens to coordination
OpenAI's own docs make the tradeoff pretty plain. Subagents are custom helpers you can define for specific jobs, invoke directly, and run in parallel. The docs also say they use more tokens and are best reserved for tasks that are mostly read-only, like codebase Q&A, documentation lookup, or background research. That is the tell.
When a vendor tells you the new feature is amazing but should be used selectively, they are telling you where the cost moved.
The old constraint in agentic coding was context pressure. Too much repo state, too many side quests, too many half-finished investigations, and the main thread turns into sludge. Subagents relieve that pressure by moving the side quests somewhere else. The new constraint is whether you can decompose work cleanly enough that the side quests come back useful instead of noisy.
For a two- or three-person dev shop, that is a real upgrade. Not because it creates infinite leverage, but because it lets one person act more like a lead engineer with delegated investigators instead of a typist babysitting one giant prompt.
What the workflow looks like when it is actually useful
The good version is boring in the best way.
You have a bug that spans three systems: frontend state, an API route, and a flaky integration test. The main Codex session defines the target and then fans out:
- One subagent traces the UI event path and reports where state becomes inconsistent.
- One subagent inspects server logs, route handlers, and validation rules.
- One subagent reads the failing tests and summarizes whether the breakage is deterministic or environmental.
Now the parent agent is not trying to hold three investigations in one conversation. It is reviewing three memos.
That changes the cadence of work. Instead of waiting for one agent to serially inspect the UI, then the backend, then the tests, you are paying for parallel discovery and then spending human attention on synthesis. For small teams, that is the right trade. Discovery is cheap. Mis-synthesis is expensive.
This is also why OpenAI's line about "steering individual agents" matters more than the parallelism headline. Steering means you do not have to accept the first decomposition. If one subagent is pulling on the wrong thread, you redirect that branch without polluting the rest of the work. The main session becomes the control plane. The subagents become disposable specialists.
Ten subagents is where the manager tax shows up
This feature will be oversold within 48 hours as "run ten engineers at once." That is nonsense.
At ten concurrent subagents, three things break fast.
First, overlap. Two agents inspect the same files, produce slightly different summaries, and now you have to reconcile them.
Second, stale assumptions. Agent four started from a code snapshot that agent two has already disproven. The branch kept working anyway because no one corrected the premise early enough.
Third, merge pressure. The more write-capable agents you run, the more your bottleneck shifts to arbitration: which edits are compatible, which ones are redundant, and which ones quietly undermine each other.
This is why OpenAI's own examples lean toward specialized, often read-heavy helpers. Parallel reads scale much more cleanly than parallel writes. Ten subagents gathering evidence can be useful. Ten subagents all editing the repo is how you manufacture cleanup work and call it productivity.
The practical ceiling for most small shops is probably three active branches of work, maybe five if the task is naturally separable and the repo is disciplined. Beyond that, the human stops being an engineer and starts being air traffic control.
The job of the senior engineer just changed shape
Subagents reward a skill that many teams do not explicitly train: breaking work into independently answerable questions.
That sounds managerial. It is actually deeply technical.
A weak operator tells the parent agent, "figure out the payments bug." A strong operator says:
- check whether the duplicate charge starts client-side or server-side
- isolate the exact write path that can execute twice
- inspect recent commits touching idempotency keys
- report back without editing anything yet
That is what steering looks like in practice. Not chatting more. Defining better lanes.
This is also where a small firm can gain real advantage over a larger one. A lean team with good architecture and crisp habits can route subagents through known seams in the codebase: ownership boundaries, service edges, test harnesses, and worktrees. A messy team just creates parallel confusion faster.
So the value is not "we now have more agents." The value is "we can turn one fuzzy request into a supervised tree of narrower investigations." If your team already writes decent specs, keeps modules reasonably isolated, and treats tests as handoff contracts, subagents will feel like a force multiplier. If your repo is a shared junk drawer, subagents will expose that faster than they improve it.
Where this lands for a two-person shop
For a tiny software shop, the best near-term use case is not autonomous implementation. It is pre-implementation compression.
Use subagents to answer three questions before anybody writes code:
- What parts of the repo does this task really touch?
- What constraints are likely to break the first attempt?
- What test surface tells us whether the change worked?
That alone trims a lot of wasted motion. The main session stays readable. The side investigations stop crowding the core decision. The human gets a better brief before committing to an approach.
Then, if the task is cleanly separable, hand one narrow write step to a single subagent working in isolation. Not five. One.
That is the sober reading of this launch. Codex subagents do not eliminate engineering management. They compress it into the prompt layer. Teams that know how to coordinate will ship faster because they can delegate discovery in parallel without drowning the main thread. Teams that do not will just hit the same wall with more simultaneous output.
The verdict is simple: subagents are a real workflow upgrade, but only for teams willing to treat coordination as the scarce resource. The shops that win with this feature will not be the ones running the most agents. They will be the ones asking the cleanest questions and merging the fewest surprises.
