Here's the example AWS uses to make its case: a prior authorization request, four steps into a pipeline that has six.
A clinical extraction agent has already pulled the diagnosis codes, the medication list, and the prior therapy history from the chart. A policy criteria agent has matched those facts against the payer's rules. A medical necessity agent has just finished reasoning through eligibility and handed off a draft justification letter. What happens next needs a person: a physician has to read that letter and approve or reject it before anything goes to the payer.
Nobody controls how long that takes. She might read it in ten minutes. She might read it after a weekend.
The dangerous question is what happens to the completed steps, the extracted chart data, and the token spend already sunk into them while the case sits in her queue. If the system doesn't have an answer, engineers end up supplying one by accident: the process times out, restarts from the top, re-extracts the chart, re-runs the policy match, re-runs the necessity reasoning, and in the worst version resubmits a request to the payer that already went out once.
That scenario is the working example in AWS's Compute Blog post from June 29, Building fault-tolerant multi-agent AI workflows with AWS Lambda durable functions, by Satish Kamat, Ben Freiberg, and Reetesh Surjani. The authors name the problem plainly: "Agentic AI workflows coordinate multiple agents that reason, plan, and act across multi-step processes. Each step is expensive, non-deterministic, and unpredictable in latency. Human review gates can pause execution for days. Transient failures are expected, and restarting a half-finished workflow wastes time and money. Duplicate actions, like charging a payment twice or sending the same request again, create financial and compliance risk."
The specific feature AWS is shipping is a durable execution primitive for Lambda. The more useful lesson, for anyone building AI workflows on AWS or anywhere else, is what that feature is a proof of: an AI workflow that can pause needs the pause designed in from the start, not bolted on after the first stuck case.
The example: four agents, one reviewer, one payer
AWS frames its prior authorization example this way: "The prior authorization workflow orchestrator coordinates four AI agents, a human review gate, and a payer submission." Clinical extraction, policy matching, medical necessity reasoning, and justification generation run as agent steps. A physician reviews the output. A payer receives the final submission and, eventually, returns a decision.
Three separate things go wrong in a workflow shaped like that if pausing isn't a designed behavior.
First, a transient failure (a model timeout, a rate limit, a dropped connection) shouldn't cost you the work that already succeeded. AWS describes the intended fix directly: "If the medical necessity agent fails after the clinical extraction agent has completed, Lambda durable function replays the handler, skips the extraction step which was already checkpointed, and retries only the failed step. This helps avoid re-incurring the time, cost, and token spend of completed steps." Each agent step is expensive in a way a database write is not: real tokens, real model calls, real minutes. Retrying the whole chain on every hiccup is not a rounding error at scale.
Second, waiting for a human shouldn't mean holding a compute process open, and it shouldn't mean losing track of the case. AWS's description of the review handoff is worth reading closely: "After the justification generation agent produces a letter, the orchestrator emits a callback ID to the clinical review system and suspends. The physician receives the draft in their review queue, reads it, and either approves or rejects it through the review UI. The UI calls the Lambda callback API with the result, and the orchestrator resumes with the approval decision." The callback ID is the whole trick. It's the handle that lets an external system, a review UI in this case, reach back into a paused workflow and hand it exactly one piece of information: what the human decided.
Third, waiting on an external party (a payer, in this example) shouldn't burn compute or duplicate the ask. AWS again: "The function incurs no compute charges during each wait interval. Each poll result is automatically checkpointed, so on replay the orchestrator skips previously completed checks." A naive version of this polls a payer's API every few minutes in a loop that stays "on" the whole time, paying for compute it isn't using and re-asking questions it already asked. A durable version waits without paying rent, and it remembers what it's already learned.
AWS's supporting documentation makes the general claim behind all three: Lambda durable functions "enable you to build resilient multi-step applications and AI workflows that can execute for up to one year while maintaining reliable progress despite interruptions," and that waits "suspend execution without incurring compute charges, making them ideal for long-running processes like human-in-the-loop workflows or polling external dependencies." A year is a deliberately large number. It's AWS saying: design for the pause to be normal, not exceptional.
Why this is bigger than one AWS feature
Swap the healthcare details for almost any operations workflow and the shape holds.
A support agent drafts a refund and waits for a manager's approval. A vendor-onboarding workflow submits a form and waits for a compliance callback. A collections workflow emails a customer and waits for a reply that might come in six hours or six days. A shipping-exception workflow calls a carrier's API and polls for a tracking update. A contract-review workflow routes a clause to legal and waits for a redline.
Every one of those has the same three failure modes AWS describes: a retry that shouldn't re-do finished work, a human step that shouldn't hold a process hostage, and an external wait that shouldn't burn compute or double-submit. The same week, Microsoft's security team framed the broader enterprise shift as AI tools moving from reading to acting. That is the moment when workflow design has to account not just for whether an agent is allowed to act, but for what keeps that action safe while it waits. Most teams discover this the hard way, because most demos never pause. A demo agent runs start to finish in thirty seconds with no human in the loop and no external dependency slower than an API call. Real workflows pause constantly. That's what makes them worth automating in the first place. The moment a workflow can take longer than one execution, "what happens during the wait" stops being an edge case and becomes the design.
Name the pause before you build around it
The fix isn't necessarily AWS's specific product. Plenty of teams will solve this with a state machine, a queue and a checkpoint table, or a different vendor's durable execution primitive. What matters is that someone writes down the pause as its own artifact before the workflow ships: a durable pause contract that a reviewer, not just an engineer, can read.
A durable pause contract names:
- Pause owner: who is allowed to resume the workflow or reject it outright.
- Callback ID: the handle that connects the paused workflow to whatever external system will eventually respond, whether that's a review UI, a payer, a vendor API, or a customer inbox.
- Checkpointed work: which agent outputs are already saved and must not be recomputed unless something invalidates them.
- Resume condition: the exact event, approval, or poll result that restarts the workflow. Not "when it's ready," but "when the callback API receives this specific payload."
- Duplicate-action guard: the idempotency key or business rule that stops a retried or re-triggered workflow from charging twice, submitting twice, or emailing the same person twice.
- Expiry and escalation: how long a paused item can sit before someone has to act on it, and who that someone is.
- Compensation path: what happens if a later step fails or a human rejects the draft. Does the workflow retry the failed step alone, route back to an earlier agent with notes, or unwind what it already did?
- Evidence trail: where the record of the whole run lives, so a reviewer or an incident responder can reconstruct what happened without stitching together five systems' logs.
That last point is where AWS's architecture argument is strongest, independent of the feature itself: "Instead of stitching together logs from a state machine, a queue, a checkpoint table, and a poller, the entire workflow is one function with one execution history." Whatever tooling a team uses, a pause that produces one coherent trace beats a pause that produces four systems a debugging engineer has to correlate by timestamp.

Written out for the prior authorization example, the contract looks like this:
Workflow: prior authorization justification and payer submission
Pause owner: reviewing physician; unclaimed reviews escalate to on-call physician
Callback ID: tied to the case's clinical review system entry
Checkpointed work: clinical extraction, policy match, medical necessity reasoning, draft justification
Resume condition: physician approval or rejection via the review UI, or a payer response on poll
Duplicate-action guard: one payer submission per case, keyed to the encounter ID
Expiry and escalation: unclaimed review after 48 hours routes to on-call physician
Compensation path: rejection returns to the medical necessity agent with reviewer notes, not a full restart
Evidence trail: single execution history covering every agent step, the checkpoint, the callback, and the payer poll
That row is boring on purpose. A compliance reviewer can check it without reading a line of code. An engineer can build against it without guessing what "handle it gracefully" means. A physician manager can see exactly how long a case can sit before someone else has to look at it.
Where this fits next to governance and rollback
Two other pieces published here sit next to this one, not underneath it. AgentCore Gateway is about the moment before an agent acts: should this tool call run at all? The rollback path is about repairing a workflow after it did the wrong thing. A pause contract assumes the action was already cleared to start, and asks what protects it while it waits on something slower than the workflow itself: a person, a vendor, a payer, a clock. A serious workflow can need all three at once: gated on the way in, protected while it waits, repairable if it comes out wrong.
That also changes what "testing an agent" means. Agent evals should test workflow receipts, not just model answers argues that evals should inspect the trail of what an agent actually did, not just the quality of its final text. A workflow with a pause contract gives evals something concrete to check beyond the happy path: does a retried run skip the steps it already finished? Does a rejected review route back correctly instead of restarting from zero? Does a duplicate callback get ignored instead of double-executed? Those are exactly the failure modes a pause contract is supposed to prevent, and exactly the ones a demo-only eval suite will never exercise.
The physician's review queue in AWS's example is also a version of an approval queue, a place where a person looks at proposed AI work before it becomes real. The pause contract is what makes that queue trustworthy operationally: the reviewer isn't just approving a letter, she's the named owner of a specific, checkpointed, resumable state.
Where to start
Don't try to retrofit every workflow with a full durable execution rebuild. Pick the one workflow in your operation that already pauses (the one waiting on a manager, a customer reply, a vendor callback, or a slow external system) and write its pause contract first. Name the owner. Name the callback. Name what's already done and shouldn't be redone. Name the duplicate-action guard before a second submission becomes a real compliance problem, not a hypothetical one.
That's the workflow BaristaLabs wants to look at with you: one process that already has to wait on someone or something outside your control, so we can turn "what happens while it's paused" from an assumption into a written, implementation-ready contract. If your team runs process automation across support, sales, or back-office work, that's exactly where this shows up first, and it's worth mapping before the workflow gets permission to run unattended for hours or days. Get in touch and bring the workflow that keeps someone waiting.
Implementation help
Bring us the workflow that has to wait
BaristaLabs helps teams turn one workflow that pauses for a person, vendor, or external system into pause ownership, checkpoint boundaries, resume rules, duplicate-action guards, and a compensation path before it runs unattended.
Best fit when a workflow already waits on human approval, a vendor callback, or a polling loop and nobody has written down what happens during the wait.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Turn this idea into a pilot
Which workflow should go first?
Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.
- 3-5 minutes
- Deterministic score
- No sensitive data
Share this post
