Quick path
In this article
Quick read: what changed, why it matters, and what to do next.
Picture the log a deep research agent leaves behind after one run over your internal documents. Not the report. The search trail: the list of queries it fired at the open web while it worked.
"healthcare data center lease expirations 2024"
"HIPAA-eligible cloud regions added January 2025"
"hospital network cloud migration case studies"
"MediConn vendor partnerships announcement"
"largest US hospital networks completing cloud migration 2025"
Read those one at a time and nothing is wrong. Every line is a question anyone could type into a search box. No patient records. No contracts. No internal memo. The agent never pasted a private document into a query.
Now read them as a set. A specific company, a specific quarter, a migration milestone, a public industry pattern. The shards line up. Someone watching only this log, like a search provider, a proxy, a compromised retrieval tool, or a curious vendor, can assemble a fact your client never published: MediConn had moved roughly 70% of its infrastructure to the cloud by January 2025. The agent answered its task correctly. It also spelled out a confidential business fact in the open, before any human read a word of the report.
That is the failure mode at the center of MosaicLeaks, published June 18, 2026 by Alexander Gurung and Rafael Pardinas at ServiceNow, with the full method in the arXiv paper. The queries above are illustrative, not lifted from the paper, but the shape is exactly what the authors study: separate, public-looking searches that together reveal a private migration milestone. There is a blunt way to put the risk: a research agent can leak before it answers.
What the benchmark actually measured
MosaicLeaks studies one specific shape of agent: a deep research agent that reads private local documents and also reaches out to external tools like web retrieval. The threat model is narrow and worth stating precisely, because it is not the one most teams are defending against. The adversary does not see the private documents. It does not see the model's reasoning. It does not see the final answer. It sees only the cumulative log of external queries. From that log alone, it tries to reconstruct private context.
The benchmark sorts what gets reconstructed into three levels. Intent leakage means the queries reveal what the agent, and therefore its operator, is investigating. Answer leakage means the queries reveal the answer to a specific private question. Full-information leakage means the queries expose verifiable claims about the enterprise's own documents. The mosaic above is the second and third kind. The danger is not a single careless search; it is the accumulation.
To stress this, the authors built 1,001 multi-hop research chains over local enterprise documents and a controlled web corpus. The chains interleave private and public sub-questions by design. The answer to one local question becomes the bridge entity the agent must carry into the next web search. That handoff is exactly where private detail bleeds into a public query.
One caveat the authors make plainly, and so will I: MosaicLeaks is a controlled benchmark, not a measurement of any deployed system. The numbers below describe one model on one constructed test set. Treat them as a lens for thinking about your own query logs, not as a leak rate for the agent you happen to run. The useful claim is structural: if an agent can search the web while reading private documents, its search trail becomes a data surface you now own.
Why telling it not to leak does not hold
The obvious fix is a sentence in the prompt: do not reveal anything private in your searches. The authors tried it. It helped inconsistently and left a lot on the table. For Qwen3-4B, leakage dropped from 34.0% to 25.5%, but strict chain success also fell, 48.7% to 44.5%. You bought a little discretion and paid for it in the agent's ability to finish the job.
The more revealing result is what happens when you optimize the agent purely for getting research tasks right. Strict chain success jumped from 48.7% to 59.3%. Good, except answer and full-information leakage climbed from 34.0% to 51.7%. Train an agent to be a better researcher and, left alone, it becomes a worse secret-keeper. The capable thing and the leaky thing are the same instinct: chase the most specific, most identifying query available, because that is what resolves the next hop fastest. Performance pressure pushes toward the spill.
Their alternative, Privacy-Aware Deep Research (PA-DR), pairs the task reward with a learned privacy reward. It reached 58.7% strict chain success, essentially matching the task-only model, while cutting answer/full-information leakage to 9.9%. The mechanism is the part operators should sit with: PA-DR did not search less. It issued more web queries than the base model, while dropping the revealing specifics, like the exact metric, the exact date, or the named entity, from those queries. It kept asking. It stopped naming.
That is the move you want to engineer into a workflow whether or not you can train a model: more questions, fewer fingerprints.
The query-spill ledger
You cannot retrain a vendor's agent. You can decide, before it ever searches with your context loaded, which private facts are allowed to leave as queries and in what form. That decision needs an artifact. We use a query-spill ledger: a row per private fact class the agent might touch, written before the run, not reconstructed after an incident.
This is the same posture as logging the right things before customer work, just aimed at the outbound search trail instead of the final actions. It complements, rather than repeats, the receipts you keep for customer work and the evals that test your workflow's receipts: receipts tell you what the agent did; the ledger governs what it is allowed to ask.

Ten fields. Copy them into a sheet and fill one row per private fact class your agent could reach.
- Private fact class: the category of internal fact in play, such as client name, migration metric, contract date, deal stage, patient cohort, or incident detail.
- External query that carried a fragment: the actual or expected search string that exported part of it.
- Necessary or unnecessary: did resolving the task genuinely require that fragment in the query, or was it convenience?
- Exposure level: intent, answer, or full information, using the MosaicLeaks ladder.
- Safer rewrite: the same query with specifics generalized, like category instead of company, range instead of exact metric, or quarter instead of date.
- Allowed source or tool: which external endpoint this query may go to, if any.
- Stop-and-ask trigger: the condition under which the agent must halt and route to a human instead of searching.
- Evidence required before allow: what must be true for the query to proceed, such as de-identified bridge entity, approved source, or no client name present.
- Post-run log review owner: the named person who reads the query trail after the run.
- Eval case created: the test you add to your harness so this exact spill is caught next time.
Here is one row, filled, using the MediConn-style illustration:
Scroll sideways to see all 2 columns.
| Field | Value |
|---|---|
| Fact class | Cloud-migration completion metric |
| Query that carried it | "MediConn 70% cloud migration January 2025" |
| Necessary? | No. The agent needed a public migration benchmark, not the client's name and figure. |
| Exposure | Full information |
| Safer rewrite | "typical cloud migration timelines for mid-size healthcare networks" |
| Allowed source | Approved industry-report endpoint only |
| Stop-and-ask trigger | Any query naming a client and a quantified internal metric |
| Evidence to allow | Bridge entity de-identified before it reaches the web tool |
| Log review owner | Data security lead |
| Eval case | Chain that tempts the agent to name client plus metric in one search |
Two rows like that change what the run sends. The agent still does its multi-hop research. It just asks the generalized version of the revealing question, the PA-DR move, enforced by your workflow instead of its weights.
Running it on one workflow
Do not ledger your whole agent program. Pick one workflow where private documents and web search already touch: a competitive brief built from internal sales notes, a vendor-risk review that cross-references public filings, or a market review seeded with customer research. Then:
- List the private fact classes that workflow handles: five to ten is plenty. These are your ledger rows.
- Dry-run the agent and capture the query log: before it touches anything sensitive, watch what it searches on a benign version of the task. The trail tells you which fact classes leak as fragments.
- Write the safer rewrite and the stop-and-ask trigger for each row: generalize the specifics, and name the conditions that should route to a human instead of the open web.
- Assign the post-run reviewer: someone reads the real query log after each run, not just the report. This is where the report may be clean while the search trail is dirty stops being a slogan and becomes a check.
- Turn each real spill into an eval case: so the next model, prompt, or vendor swap gets tested against the leak you already found.
If you want the broader frame around this, the ledger slots into a pre-access security review and the AI workflow security review worksheet; for the governance layer, see our AI workflow controls. The point of the ledger specifically is to make the outbound query a reviewed surface, not an afterthought.
Map one agent's query spill before it searches
If you are standing up a research agent over internal docs, such as sales notes, support tickets, compliance packets, customer interviews, or vendor reviews, the question to answer before launch is not only what it can read and write. It is what it will ask, and what those questions spell when lined up.
We will sit with one of your research workflows and build its query-spill ledger with you: the fact classes, the safer rewrites, the stop-and-ask triggers, the reviewer, and the first eval cases, wired into how your process automation already runs. Start a query-spill ledger for one workflow.
Research-agent privacy help
Map one query-spill ledger before your agent searches
Bring one research workflow that mixes internal documents with public search. BaristaLabs will help define private fact classes, safer query rewrites, stop-and-ask triggers, log review, and eval cases.
Best fit for teams testing research agents over sales notes, support tickets, compliance packets, customer interviews, or vendor reviews.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
Turn this idea into a pilot
Which workflow should go first?
Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.
- 3-5 minutes
- Deterministic score
- No sensitive data
Share this post
