The agent has already done the hard part.

It pulled the bank statement and the ledger, matched the day's transactions line by line, and found one that doesn't reconcile — a $4,200 gap between what the drawer says and what cleared. It's confident. It drafts the adjusting entry, attaches a tidy explanation, and moves to post it.

That last move is the whole problem.

Because in this version of the workflow, the same agent that found the discrepancy is the one that approves the fix. It prepares the work and it signs off on the work. No second party. No conflicting interest. Just one actor, plausible and fast, doing both halves of a job that every finance team on earth deliberately splits between two people.

This is the gap MakerChecker is built to close. It's a new open-source, self-hosted project with a blunt description on the tin: "Roles, segregation of duties, and tamper-evident audit for the AI agents you already run." Its Show HN post is even blunter — "Stop your AI agents from approving their own work." The author says it came out of years of struggling to get AI adopted in regulated industries, where the goal isn't a flashier demo. It's getting agents that move money, triage healthcare, or ship to production out of the pilot sandbox and into real work.

That's worth a teardown. Not the marketing — the control model.

The line worth tearing down

Strip the README to one clause and this is it: an agent "provably cannot approve its own work."

Most agent demos treat approval as a checkbox you bolt on at the end. A human-in-the-loop button. A Slack message that says "approve?" The trouble is that a button is a request for review, not a guarantee of it. If the same identity can submit the request and click the button — or if a tired operator rubber-stamps whatever the agent proposes — you don't have separation. You have theater with an audit trail of one name appearing twice.

Separation of duties is older and stricter than that. It's the principle that the person who prepares a transaction cannot be the person who approves it. Auditors care about it because most fraud and most expensive mistakes live in the seam where one actor controls both ends. The agent version is identical: the model that proposes the payment shouldn't be the model that releases it.

The author points to the failures this is meant to prevent, and ships them as runnable examples — Knight Capital, where a botched release set off uncontrolled trades, and the Air Canada chatbot that invented a refund policy the airline was then held to. Different domains, same shape. An automated actor took an action that no separate party was structurally required to check first.

So the interesting question isn't "did a human approve it?" It's "could the same actor have prepared and approved it?" If the answer is yes, the control doesn't exist, no matter how many buttons are in the UI.

What the two-key rule actually requires

Think of a missile silo, a bank vault, a safe-deposit box: two keys, two people, one action that can't happen until both turn. The point isn't ceremony. It's that no single hand can complete the move alone.

MakerChecker builds that for agents out of a few primitives, documented in its concepts doc. The mechanics are worth knowing because they show where the enforcement actually lives — and it's earlier than you'd guess.

An agent acts only through a role. Permissions never attach to the agent directly; they attach to its one role. Reassign the role and you change what it can do, on the record.
A role runs only the skills it was granted — pinned to an exact version. Deny by default. No grant for post-payment@1, no execution. Change the skill's behavior and that's a new version, which the old grant doesn't cover.
Limits fail closed. A role can carry a per-invocation amount ceiling and a per-run invocation count. If the input's amount field is missing or unreadable, the call is denied, not waved through. "I couldn't tell how much this was" resolves to no, not yes.
High-risk skills can't run without a gate in front of them. A high-risk capability is refused unless the flow places an approval gate earlier in the sequence. There's no path where the dangerous action just… runs.
Segregation of duties is a structural constraint between roles. Bind two roles as a conflicting pair, and if one of them already acted in a run, the other is blocked from acting in the same run. The maker role and the checker role can't be the same hand.

The piece that makes this more than configuration: enforcement runs before the tool body executes, and it runs twice — once when the step is scheduled, once immediately before invocation. A grant revoked in between is still caught the second time. The check isn't a polite suggestion the agent can reason its way around. It's a wall the call hits first.

When a human signature is required, the gate has teeth. Define named approvers and the gate flips into identity mode that fails closed: decisions must come from authenticated, named people; the run's triggering user can't decide; the same user can't decide twice. The requester cannot approve their own work — and that default is on, not opt-in.

The artifact: a two-key run record

Here's the thing to take away from the teardown. For a single agent action, you can write down the complete control on one card. Call it the two-key run record: every field that decides whether the action was allowed, and every field that proves what happened.

A glowing action token suspended between two separate glass gates, with a chain of blank audit blocks behind it. — The two-key run record separates the actor who prepares an agent action from the actor who approves it.

Field	What it answers
Maker role	Which role prepared the action — the first key.
Checker role	Which different role (or named human) approved it — the second key. They cannot be the same actor in the run.
Granted skill version	The exact `name@version` the maker's role was allowed to run. Not "payments," but `post-payment@1`.
Limit	The ceiling that applies — amount per invocation, count per run — and the fact that an unreadable amount denies.
Approval gate	Whether a named human had to sign, who's on the approver list, and the rule that the requester can't be one of them.
Audit event	The chained, signed entry written in the same transaction as the state change.
Verification result	Whether the chain still verifies — `{ ok: true }`, or a `failedSeq` pointing at the row that broke.
Rollback owner	The named person responsible if the action has to be undone. (MakerChecker records the action; you still owe a named human here.)

The record-keeping half is the part teams underrate. Every state change and tool call commits to a hash-chained, Ed25519-signed audit log. Each event's hash is computed over its canonical contents and chained to the one before it. Change a single row and recomputation breaks at that row — GET /api/audit/verify hands back the sequence number where it failed. You can export a signed bundle and verify it offline, in any language, with no access to the running system. An examiner doesn't have to trust your database. They re-run the math.

One honest caveat the project states plainly, and so will I: the quickstart runs as the Postgres owner, which disables the append-only triggers. For tamper-resistance against a compromised app credential you run the hardened setup with a non-owner database role. The guarantee is real, but it's a deployment choice, not a default. Read the security model before you tell an auditor the chain is immutable.

Where this changes the pilot-to-production conversation

Most agent pilots stall at the same sentence: "It works great, but I'm not letting it touch the real system yet." That hesitation is usually right, and usually misdiagnosed. The blocker isn't model quality. It's that there's no structural answer to who's allowed to prepare this, and who's allowed to release it.

We've made versions of this argument before from different angles. The AWS AgentCore Gateway piece was about putting a gate between model intent and tool execution — the check in front of the action. Agent receipts were about the record behind it. MakerChecker's contribution is to make role conflict the center of the model: not just "is there a gate," but "can the same actor stand on both sides of it." That's the part demos skip and auditors open with.

MakerChecker doesn't replace your agent framework, which is what makes it adoptable. Your agents keep running where they run. It connects either as a flow it executes and gates, or a proxy session where it authorizes and records the tool calls your framework executes — and both paths write the same audit chain. It's the checkpoint in front and the record behind. That maps cleanly onto the control layer we keep recommending in AI workflow controls: a gate, a queue, a receipt, a rollback path.

What a small team should write down first

You don't need to deploy anything to get value from this. You need to do the teardown on your own workflow, on paper, before you copy the pattern.

Pick one agent workflow that touches something you'd hate to get wrong — a payment, a refund, a customer record, a production release. Then write down its two-key run record by hand:

Name the maker and the checker. If they're the same actor, stop. That's the finding. Everything else is downstream of fixing it.
Pin the skill to a version. "Can issue refunds" is too broad to grant. issue-refund@2 is a thing you can revoke.
Set the limit, and decide what an unreadable amount does. If the answer isn't "deny," you've left the door open for the one case you didn't anticipate.
Decide where a named human must sign — and confirm the requester can't be that human.
Name the rollback owner. A signed audit log tells you what happened. It doesn't undo it. A person does.

If you can fill in all eight fields for one workflow, you have a production-ready control on one action. If you can't, you've found exactly what's keeping that pilot in the sandbox — which is the more useful outcome.

The same exercise, done across a finance workflow, is where buried business rules tend to surface, because separating maker from checker forces you to name the rule the checker is actually checking against.

The question to leave with

For one agent workflow you run today: can the same agent prepare and approve its own action?

If yes, it needs a two-key run record before it goes near production — not a button, a structure. Two keys, two actors, one action that can't complete until both turn, and a record an examiner could verify without trusting you.

If you want help mapping that for a real workflow, bring us one and we'll separate the maker from the checker, pin the skills, set the limits, and name the rollback owner. You can also start from the patterns in our approval queue and AI workflow controls write-ups.

The agent matched the drawer. Good. Now decide who gets to post the fix — and make sure it isn't the same hand that found it.

AI agents need a two-key rule before they move real work

The line worth tearing down

What the two-key rule actually requires

The artifact: a two-key run record

Where this changes the pilot-to-production conversation

What a small team should write down first

The question to leave with

Which workflow should go first?

Want more practical AI operations ideas?