Warnings are not confetti: build a warning budget before agents merge

Agent evals should test workflow receipts, not just model answers

Put the agent firewall in observe mode before it blocks real work

Inside browser agents, the boring constraint doing the real work

Build a warning budget ledger

Article-specific next step

Build a warning budget for one agent-reviewed change

Bring one agent-generated or agent-reviewed pull request. BaristaLabs will help turn its WARN findings into a ledger with owners, activation triggers, promotion rules, and expiry dates.

Best fit for teams letting AI agents edit repos, sync skills, run browser automation, or review pull requests.

Sensitive systems

Stalled infrastructure work can be scoped without exposing private details.

For an anonymized certification board, BaristaLabs completed an AKS upgrade in 1 week with zero downtime and restored a vendor-supported Kubernetes version path.

0
application downtime: 4x
more subnet IP capacity

Anonymized case study for regulated technical work.

Client and infrastructure details stay confidential.

Read case study

Share this post

Agent evals should test workflow receipts, not just model answers

Put the agent firewall in observe mode before it blocks real work

Inside browser agents, the boring constraint doing the real work

Build a warning budget ledger See the agent receipt template

Keep Reading

Industry Insights

Warnings are not confetti: build a warning budget before agents merge

Eight reviewer agents approved the merge and left a page full of yellow triangles. The button is live. The warnings are still alive. Here is the artifact for that gap.

Sean McLellan

Lead Architect & Founder

June 24, 20268 min read

The merge button is live. Above it, the report reads GATE PASSED WITH WARNINGS — Ready for merge. Below it, the page is a column of yellow triangles.

Here is the summary they produced, paraphrased from the public PR:

Code Review      WARN   0 critical   2 warnings
Security Audit   WARN   0 critical   2 warnings
Bug Analysis     WARN   0 critical   2 warnings
Test Coverage    PASS   0 critical   1 warning
Code Quality     PASS   0 critical   2 warnings
Documentation    WARN   0 critical   3 warnings
Silent Failures  WARN   0 critical   4 warnings
Test Quality     PASS   0 critical   1 warning

Zero critical issues. Zero hard failures. Seventeen warnings. The verdict line is honest about exactly that: passed, with warnings, ready to merge.

What happens to seventeen warnings the moment you click merge?

The two wrong answers

There are two reflexes here, and both are wrong.

Read the warnings, not the verdict

Look at what those eight agents actually flagged in PR #61, because the specifics are the point.

None of those is a crash. None blocks the merge. Each one is a small, specific bet: this is fine for now. And here is the thing about "for now" — nobody wrote down when "now" ends.

Why this change in particular raises the cost of a vague warning

The warning budget ledger

So write the thing the gate doesn't. Not a longer review — a place for the warnings the review leaves standing.

Nine columns. Copy them into a sheet and fill one row per warning the gate left standing.

Warning — the finding in one line, with the agent that raised it. "Dormant hook runs curl HEAD, could leak real IP if activated (Security Audit)."
Current severity — your call today, not the agent's label: tolerate, watch, or fix-now. The gate said WARN; you decide what that means here.
Why allowed today — the actual reason this is acceptable right now. "Hook is unregistered and unreachable in current config." If you cannot write this sentence, the warning is not tolerated, it is ignored.
Activation trigger — the concrete condition that turns this from inert to live. "Any PR that registers the hook into a lifecycle event."
Promotion rule — what flips this row from warning to blocker. "If the trigger fires, this becomes a FAIL and blocks merge until reviewed."
Owner — a named person, not a team. The one who answers for this row.
Expiry date — the date this tolerance ends and someone must re-decide. A warning with no expiry is a permanent one you never chose to keep.
Evidence link — where the finding and its context live: the PR comment, the line, the gate output.
Next test — the check that catches this automatically next time, so it stops depending on a human reading a yellow triangle.

Three warnings from PR #61, in the ledger

Take three of the real findings and run them through the columns. Illustrative values, your project's specifics would differ.

Scroll sideways to see all 4 columns.

Column	Dormant hook network call	Missing `allowed-tools` frontmatter	Suppressed lock-removal error
Warning	Unregistered hook runs `curl HEAD`; could leak real IP if activated (Security Audit)	One skill ships without `allowed-tools` declared (Documentation)	Errors removing a lock file are silently suppressed (Silent Failures)
Current severity	Watch	Tolerate	Watch
Why allowed today	Hook is dormant and unreachable in current config	Skill currently invokes no privileged tool	Lock contention not observed in single-process use
Activation trigger	Any change that registers the hook into a lifecycle event	Skill starts calling a tool it never declared	A second concurrent process touches the same lock
Promotion rule	Trigger fires → FAIL, block until security re-review	Undeclared tool call → block until frontmatter added	Concurrency introduced → block until error is handled, not swallowed
Owner	Security lead	Skill author	Infra owner
Expiry date	2026-07-20	2026-07-20	2026-08-20
Evidence link	PR #61, Security Audit comment	PR #61, Documentation comment	PR #61, Silent Failures comment
Next test	CI check: fail if any hook is both registered and does outbound `curl`	Lint: every skill must declare `allowed-tools`	Test: lock-removal failure must log, not suppress

Your first pass, on one repo

Do not build a warning-governance program. Build one ledger, for one change.

Pick the next agent-reviewed or agent-generated pull request that merged with warnings. Not a clean one — one with yellow.
Copy every surviving warning into a row. If the gate produced twelve and you tolerated all twelve, you have twelve rows. Empty cells are the finding.
Write the "why allowed today" sentence for each. This is the filter. The rows where you cannot write that sentence are not tolerated warnings — they are the ones to fix or block now.
Put a name and a date on every row. No team names, no "TBD." An owner who answers for it and a date the tolerance expires.
Turn the highest-stakes row into a Next test — usually a capability that can act, like the browser-eval or outbound-network ones here — so the next merge re-checks it for you.

Bring one PR; leave with a ledger

Warnings are not confetti. They are debt with a due date. Give every one a name and the date it comes due.

Agent merge-gate help

Build a warning budget for one agent-reviewed change

Bring one agent-generated or agent-reviewed pull request. BaristaLabs will help turn its WARN findings into a ledger with owners, activation triggers, promotion rules, and expiry dates.

Best fit for teams letting AI agents edit repos, sync skills, run browser automation, or review pull requests.

Practical AI Workflow Notes

Want more practical AI operations ideas?

Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.

Turn this idea into a pilot

Which workflow should go first?

Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.

3-5 minutes
Deterministic score
No sensitive data

Check workflow readiness

Share this post

Agent evals should test workflow receipts, not just model answers

Put the agent firewall in observe mode before it blocks real work

Inside browser agents, the boring constraint doing the real work

Build a warning budget ledger

Article-specific next step

Build a warning budget for one agent-reviewed change

Bring one agent-generated or agent-reviewed pull request. BaristaLabs will help turn its WARN findings into a ledger with owners, activation triggers, promotion rules, and expiry dates.

Best fit for teams letting AI agents edit repos, sync skills, run browser automation, or review pull requests.

Sensitive systems

Stalled infrastructure work can be scoped without exposing private details.

For an anonymized certification board, BaristaLabs completed an AKS upgrade in 1 week with zero downtime and restored a vendor-supported Kubernetes version path.

0
application downtime: 4x
more subnet IP capacity

Anonymized case study for regulated technical work.

Client and infrastructure details stay confidential.

Read case study

Share this post

Agent evals should test workflow receipts, not just model answers

Put the agent firewall in observe mode before it blocks real work

Inside browser agents, the boring constraint doing the real work