The interesting AI failure mode this week was not a model hallucinating code.
It was a pull request thread turning into a small airport.
In one public GitHub pull request, a human maintainer was working through normal repository chores: dependency updates, Sentry wiring, privacy-safe verification, workflow changes, and test cleanup. Around that work, several automated reviewers and status tools showed up in the same conversation.
Some comments were useful. One automated reviewer caught a serious Sentry configuration issue: production code was treating SENTRY_AUTH_TOKEN like a DSN. That is exactly the kind of detail a tired human reviewer can miss.
Some comments were status noise. CodeAnt announced that it was reviewing the PR and included a sharing block. CodeRabbit paused because the branch was moving too quickly. Amazon Q Developer reported that it could not finalize a review because the pull request head or merge base had changed. CodeRabbit autofix reported that the branch changed mid-fix. In another run, it generated workflow-file fixes it could not commit because of permissions.
None of that makes the tools useless.
It shows what happens when every tool uses the same comment thread as its runway.

A pull request comment is now an operations surface
For years, a pull request comment was mostly a human note: please rename this, this edge case fails, can you add a test?
AI review tools are changing the shape of that thread.
Amazon Q Developer's code review docs describe reviews for security vulnerabilities and code quality issues. They can cover recent changes, a single file, an entire project, or a codebase. Amazon says the review uses both generative AI and rule-based automatic reasoning.
CodeRabbit's docs describe a much wider participant than a simple linter: review instructions, path-based rules, automatic controls, autofix, CI analysis, custom checks, unit-test generation, and review commands.
GitHub Copilot's review docs put the same pattern inside GitHub-native review. That connects to the broader shift we covered in GitHub's Copilot app turns coding agents into delivery sessions, but this post is about a narrower object: the comment thread itself.
Once reviewers can summarize, scan, patch, rerun, and explain CI, the PR thread stops being just discussion.
It becomes an operations surface.
That surface needs traffic rules.
The PR showed four different comment types mixed together
The useful artifact in the public PR is not the repository itself. It is the mix of comment types.
First, there were review-start and review-state comments. A tool announces that it is reviewing. Another pauses because the branch is changing. Another says the head or merge base moved.
Second, there were findings. Kilo Code flagged the Sentry auth-token issue. That is a real review output.
Third, there were autofix status comments. Some fixes could not run because the branch changed. Some suggestions could not be committed because workflow files needed higher permissions. Later, other autofix commits landed.
Fourth, there were setup and promotional notes. Those may be harmless, but they compete for attention with actual findings.
A human maintainer has to turn that pile into a decision: what should be fixed, what already changed, what failed, what is stale, and what still needs a person?
That is not a code-generation problem. It is a review-operations problem.
The missing artifact is a lane map
Teams do not need to ban AI reviewers. They need to assign lanes.
A simple lane map says what each tool owns before it comments:
| Lane | What it owns | What it should not own |
|---|---|---|
| Summary | Files changed, intent, risky areas | Final approval |
| Security | Secrets, auth, permissions, data handling | Style comments |
| Quality | Maintainability, tests, obvious defects | Deployment policy |
| Autofix | Low-risk patches with narrow scope | CI, secrets, migrations, production config |
| Human owner | Accept, ignore, rerun, merge | Reading a dozen status comments to guess what happened |
The lane map should also name file boundaries. A bot that can fix markdown typos does not automatically get permission to edit workflow files. A tool that reviews dependency updates does not automatically get permission to change deployment scripts.
That is the same principle behind writing the AI approval policy before choosing the agent. Decide what the system may read, suggest, change, and escalate before the tool gets access.
Pull request review now belongs in that policy.
Autofix needs a stricter rulebook than review
Review comments are advice. Commits are action.
A bot that says "this workflow condition looks wrong" is making a recommendation. A bot that rewrites .github/workflows/ci.yml is changing how software ships.
Those are different risk classes.
The public PR showed the distinction cleanly. CodeRabbit generated workflow-file changes but could not commit them because of permission restrictions. That is frustrating for the maintainer in the moment, but it is also the right instinct. Workflow files, deployment scripts, secret handling, migrations, and production telemetry deserve stricter gates than formatting changes.
A practical autofix rulebook might look like this:
- Low-risk automatic fixes: spelling, markdown formatting, generated snapshots, non-production style cleanup.
- Approval-required fixes: application code, test logic, dependencies, config changes.
- Never automatic fixes: CI workflows, deployment scripts, auth, secrets, billing, migrations, compliance controls, customer-data handling.
The exact split will vary by team. The important part is writing it down before a bot offers to fix something.
If a vendor says its reviewer can autofix issues, ask four questions:
- Which file paths can it change?
- What happens if the branch moves while it is working?
- Does it commit directly, open a separate commit, or only attach a patch?
- Which fixes require a human approval boundary?
Autofix without boundaries does not remove work. It moves uncertainty from the code into the review process.
Stale branches need a default behavior
The branch moved during review. The branch moved during autofix.
That will happen constantly.
Developers keep working while bots review. CI reruns. Another tool posts. Someone rebases. Someone pushes a small fix. A comment that was accurate against commit A may be stale by commit C.
A good AI review setup needs a stale-branch policy:
- If the branch changes during review, mark the result stale instead of posting findings as if they apply to the current head.
- If the branch changes during autofix, discard the patch or attach it as a stale suggestion instead of retrying blindly.
- If a review pauses because too many commits are landing, show one state comment, not a comment storm.
- If a tool lacks permission to commit a generated fix, label the result as human action required.
This sounds like boring process work because it is.
Most AI workflow failures do not look dramatic. They look like a queue nobody owns.
The final output should be a review receipt
A PR with several automated reviewers needs one final receipt.
Not another generic summary comment. A receipt.
It should tell the human decision owner:
- What each tool checked.
- Which commit or file range it reviewed.
- Which comments were accepted.
- Which comments were ignored.
- Which fixes were committed.
- Which fixes failed.
- Which findings still need a person.
- Which parts of the PR were not reviewed.
That connects directly to the pattern we covered in Agent receipts: what to log before AI touches customer work. A receipt is not just a log. It is the evidence trail that lets a team explain what happened after the work is done.
For a pull request, the receipt might be short:
Example
Security review flagged a Sentry token/DSN issue. Maintainer fixed it. Autofix committed two low-risk changes. Workflow-file suggestions were not committed because bot permissions blocked them. Amazon Q review went stale after the merge base changed and was not rerun. Final merge approved by the maintainer.
That note is more useful than a vague statement that the PR was reviewed by AI.
The better question is not "did a bot look at it?"
The better question is "what did each reviewer check, what changed, what failed, and what still depended on a human?"
A short checklist before adding another AI reviewer
Before installing another automated reviewer, answer these in writing:
-
What lane does this tool own?
Do not install three general reviewers and hope they sort themselves out. Assign summary, security, quality, and autofix lanes deliberately.
-
What is it allowed to change?
Separate comment permission from commit permission. Treat workflows, secrets, deployments, customer data, and production config as approval-required by default.
-
What happens when the branch moves?
Decide whether the bot pauses, restarts, marks stale, or attaches a patch. Avoid retries that add comments without moving the PR forward.
-
Who owns the final decision?
A bot can recommend. A person still needs to decide what gets merged, especially when reviewers disagree or a fix touches sensitive behavior.
-
Where is the receipt?
If the PR has several automated reviewers, require one final review receipt. The receipt should show what was checked, what changed, what failed, and what still needed human judgment.
For teams that want a starting point, BaristaLabs keeps a practical AI agent receipt template that can be adapted for review workflows.
A better story than "we added AI review"
AI code review bots will keep improving. They will catch real defects. They will write better patches. They will get better at reading CI output and project conventions.
That does not remove the need for lanes.
Small teams and agencies should be especially careful here. A busy maintainer does not need five tools competing to prove they are useful. A client does not need a screenshot of a crowded pull request thread. A software lead does not need another place where accountability disappears behind automation.
They need a workflow they can explain after the merge.
This reviewer checked security. This reviewer checked maintainability. This bot was allowed to patch docs but not workflows. This finding was fixed. This one was ignored. This failed because the branch moved. This person approved the final change.
That is a better story than "we added AI code review."
For teams building this into real software work, the pattern is the same one we use in process automation: define the lane, define the permission, define the receipt. The tool matters, but the handoff matters more.
If your pull request has six reviewers and nobody can say which one owned the decision, you do not have an AI review stack.
You have a comment pile.
Copy the agent receipt template
Design a safer AI review lane
BaristaLabs helps teams define one narrow AI code-review workflow with scoped permissions, human approval, receipts, and clear rollback paths.
Best fit for teams adding AI reviewers, autofix tools, coding agents, or CI companions to repositories that already have human review.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Share this post
