Quick path
In this article
Quick read: what changed, why it matters, and what to do next.
The invoice agent runs its three tool calls and finishes. Out of habit, someone types rootsign verify <session>. The terminal answers VALID ✓ — 3 records, chain intact.
Good. Except a few lines up, the same terminal logged a warning while the agent was still working: Session is already flushing, with an ingest failure noted against the notify_customer step. Two things are now true at once. The audit log says the record is whole and unaltered. The run also coughed on its way there.
That gap is the whole reason an audit trail can be present and still not be something you trust yet. The log existed. Nobody had checked whether it survived the ugly path.
What RootSign is, and what the chain proves
The artifact in that scene is real and new. RootSign is an open-source Python library from Providex AI for what it calls tamper-evident provenance logging in production AI agents. The repo went public on May 20, 2026, ships under Apache-2.0, and is on version 0.1.2 on PyPI as of this writing. It is early — Phase 1, a working MVP, not a mature platform — and it is worth holding it at that altitude.
The problem it names is one most teams hit the first time an agent does something they have to explain after the fact. When an agent takes real actions in production — calling tools, hitting APIs, writing to databases — there is no built-in record proving what happened, in what order, on whose authorization, or whether someone changed the record afterward. The agent did a thing. The thing is done. Reconstructing it later is somewhere between hard and impossible.
RootSign's answer is a hash chain. Each agent action is captured as an Action record, and each record carries a SHA-256 hash of the action before it. Link by link, the actions form a chain where every entry commits to its predecessor. Change one record after the fact and its hash no longer matches what the next link expects, so rootsign verify reports the break instead of the calm green VALID. The README describes the rest of the Phase 1 surface in the same spirit: LangGraph and CrewAI integrations, a verify CLI, PII redaction, human-in-the-loop checkpoints, and opt-in capture of the decision behind an action, not just the action.
This is the next stress test after a decision you may have already made. We have written before about building receipts that log what an agent did after it acts. That piece is about what to keep. This one is about something harder: whether the thing you kept holds up when the run goes sideways. A hash chain is a strong claim. Strong claims are exactly the ones you should try to break before you lean on them.
The bug report is the better story
The launch is the headline. Issue #2, filed on June 18, 2026, is the part you can actually learn from.
Someone ran the project's own langgraph-invoice-agent example against a LangGraph create_react_agent using a ToolNode, and watched it throw SQLAlchemy warnings plus an ingest-failure log on the notify_customer step: Session is already flushing. The suspected cause, in the issue's own words, is plumbing more than cryptography. LocalIngestClient holds a single shared AsyncSession. A ToolNode fires tools in quick succession, so two await client.handle(envelope) calls can interleave on that one session, and SQLAlchemy's ORM bookkeeping complains about the overlap.
Here is the nuance that makes the issue genuinely useful rather than alarming. The action records still land. rootsign verify on that session returns VALID ✓ — 3 records, chain intact. PostgreSQL row-level locking and the chain-link insert serialize the writes underneath, so data integrity holds even while the ORM layer is grumbling. The maintainers turned a related normalization fix around fast — v0.1.2 is named for making LangGraph tool output JSON-safe before it hits the database. Active early iteration, which is what you want to see on a young project, and also not the same thing as enterprise-ready.
So the bug is, technically, log noise rather than data loss. But read it as an operator and it is something more valuable: a map of the question you should have been asking. Not does the log exist? The log existed and verified green. The real question is did the chain survive concurrent tool calls? — and the only honest way to answer that is to run the concurrent tool calls yourself and look.

Do not admire the audit trail. Break it on purpose.
Teams adopt agent logging because they want certainty after something goes wrong. The instinct is to stand the system up, see a clean VALID, and feel covered. But the first real stress on an audit trail is almost never a cinematic breach. It is boring: two tool calls arrive a few milliseconds apart, a database session is shared, one write path logs a warning, and now somebody has to prove the record is still trustworthy under questioning.
You find out whether it is by rehearsing the failure while the stakes are zero. Not a load test, not a security review — a small, deliberate run where you push the chain through the conditions production will eventually hand it anyway, and you watch what the verifier says at each turn. Concurrent calls. A reviewer pause. A retry. A redacted field. A record you alter yourself, on purpose, to confirm the alarm actually fires.
What hash chaining buys you is detection: alter a committed record and verification catches it. What it does not automatically buy you is correct behavior on the messy paths — that the retry did not write a half-record, that redaction masked the field without breaking the hash, that the reviewer's approval is itself a link in the chain and not a sticky note next to it. Those are properties you confirm by running them, not by reading a README.
The audit-chain rehearsal card
Here is a small packet you can run against one workflow in an afternoon. One agent session, three quick tool calls, one reviewer checkpoint, one forced retry, one redacted field, one tamper test, one final verification. The table is not the deliverable. Running it is — because the run forces every question production was going to ask anyway, just earlier, on a session nobody is depending on.
On phones, each rehearsal move is shown as a readable card; on wider screens, scroll sideways if needed.
| Move | What you run | What it stress-tests | What "pass" looks like |
|---|---|---|---|
| Name the session | Register one agent, open one session, note the session id | Baseline identity and ordering | verify returns the session with a clear id and zero records yet |
| Fire three tool calls fast | Invoke three tools back-to-back through the ToolNode path | Concurrency on a shared write session | Three Action records land, chain intact, even if a warning prints |
| Insert a reviewer checkpoint | Route one action through a human approval pause | Whether the pause is in the chain or beside it | The approval is its own link, with who approved and when |
| Force a retry | Make one tool fail once, then succeed on retry | Duplicate and partial writes | One logical action, no orphaned half-record, chain still single-threaded |
| Redact one field | Mark a field as PII before capture | Redaction without breaking the hash | Field is masked in storage and verify still returns VALID |
| Tamper on purpose | Hand-edit one stored record after the fact | Detection, the whole premise | verify returns the break and points at the failed link |
| Verify out loud | Run rootsign verify <session> and read it aloud | What you can actually claim later | The expected record count, or a specific failure you can explain |
Three of these rows tend to surprise people the first time.
The forced retry is the one that quietly fails on a lot of homegrown logging. A tool errors, the framework retries, and now you have to ask whether your trail recorded one action or one-and-a-half. RootSign's serialized chain-link insert is exactly the kind of thing that makes this come out right, but you confirm it by causing the retry, not by assuming.
The reviewer checkpoint is where this connects to work you may already have in flight. If you run an approval queue so a human signs off before an agent acts, the approval needs to land inside the audit chain — not in a separate tool nobody cross-references at 11pm during an incident. The rehearsal tells you which it is.
The tamper test is the one people skip because it feels like sabotage. It is the opposite. Editing a record by hand and watching verify go red is the only proof you have that the green VALID means anything at all. An alarm you have never heard ring is not yet an alarm.
Test the evidence like part of the system
This is also the difference between testing an agent's answers and testing its evidence. We have argued that agent evaluations should inspect the workflow receipts, not just grade the final output. The rehearsal card is that idea pointed at the audit trail itself: the evidence is part of the system, so the evidence gets stress-tested like part of the system.
If you want to run this against something real, take the AI workflow security review worksheet and slot the seven moves in next to whatever logging you already have, RootSign or otherwise. The library is one good, early implementation of the idea; the rehearsal works against any audit trail you can verify and tamper with.
Then pick the one workflow where an agent already writes — updates a record, sends the invoice, closes the ticket — and run the card before the next one goes in. You will probably find one path where a retry double-writes, or a redaction that quietly changes the hash, or an approval that lives outside the chain. Every one of those is cheaper to find on a rehearsal than in the incident review where the log is the only account of what happened.
Do not admire the audit trail. Break it on purpose, while breaking it is free.
NEXT STEP
Break your agent's audit trail before production does
Pick the workflow where an agent already calls tools or updates systems, run the seven moves in this article against it, and find out whether the chain survives the ugly path while it is still cheap to fix.
The rehearsal card is free to copy. The session is for teams that want a second set of eyes before an agent's log becomes the only account of what happened.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Turn this idea into a pilot
Which workflow should go first?
Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.
- 3-5 minutes
- Deterministic score
- No sensitive data
Share this post
