Quick path
In this article
Quick read: what changed, why it matters, and what to do next.
A banking customer asks a normal question: "Show me my recent transactions."
The app retrieves amounts, dates, merchants, and descriptions. One description was written by someone else. It arrived through a tiny transfer.
That is the uncomfortable lesson in Blue41's case study about bunq's financial AI assistant. Blue41 says its proof of concept used a EUR0.02 transfer. The attacker controlled the transaction description. When the banking assistant later pulled recent transactions into the model context, that description was no longer just a payment memo. It was text sitting beside the instructions that shaped the answer.
Blue41 says it helped bunq, described in the post as Europe's second-largest digital bank with more than 20 million customers, secure the assistant against spearphishing risk. The banking detail is the hook. The operating lesson is wider: any field your system retrieves for an AI answer can become a prompt-delivery channel.
For most companies, the equivalent field is not a transaction memo. It is a support ticket, a CRM note, a vendor email, an uploaded PDF, a web page, a survey response, or a chat transcript.
The question is not "can we write a better prompt?" The better question is: which fields are allowed to influence which outputs?
The attack traveled through a trusted-looking field
Prompt injection usually gets explained as a user typing hostile instructions into a chatbot. That mental model misses the Blue41 scenario.
In the bunq proof of concept, the attacker did not need to interact with the victim's assistant. The attacker only needed to put text somewhere the assistant would later read.
The path looked like this:
- Send a small transfer.
- Put hostile text in the transfer description.
- Wait for the victim to ask a routine question about recent transactions.
- Let the assistant retrieve the transaction data.
- Let the model confuse attacker-written data with instructions.
A Hacker News commenter compressed the issue well: "It was never about the prompt, it is about the prompt delivery."
That framing matters because it moves the security review from model personality to data plumbing. If a field can be written by an outsider, imported from a partner, copied from the web, or pasted by a customer, it should not quietly enter the same decision space as trusted application instructions.
OWASP's LLM01 Prompt Injection guidance defines prompt injection as prompts altering an LLM's behavior or output in unintended ways. It also describes indirect prompt injection, where content from external sources such as websites or files changes model behavior when the model accepts that content as input.
That is the key distinction. The dangerous text does not have to be typed by the person using the assistant. It only has to be retrieved.
A field inventory beats another policy paragraph
Teams often respond to prompt-injection risk by adding one more instruction: ignore malicious content, follow the system prompt, do not reveal secrets, do not click links.
Those lines can help, but they are too vague to be the control surface.
The useful artifact is a field inventory for one workflow. Not a grand AI governance framework. One page. One workflow. Every text field the assistant reads.
For a payment or account-support workflow, the inventory should answer seven questions:
- What is the field?
- Who can write it?
- Does the assistant need it for this task?
- What may the assistant do after reading it?
- What outputs or actions are blocked?
- What evidence gets logged?
- Who owns exceptions?

| Field | Writer | Needed for this task? | Allowed influence | Blocked influence | Evidence to keep | Owner |
|---|---|---|---|---|---|---|
| Transaction description | External sender, merchant, customer | Only for transaction summaries | Neutral summary of the record | Links, credential requests, security advice | Record ID, field used, blocked reason | Fraud or support lead |
| Support ticket body | Customer or impersonator | Yes | Draft response for review | Credential reset, account changes, refund promise | Ticket ID, retrieved sources, proposed action | Support manager |
| CRM note | Internal staff or imported system | Sometimes | Account-history summary | Treat note as policy or permission override | Note ID, field used, output category | Account owner |
| Uploaded PDF | Customer, vendor, applicant | Depends | Extract structured facts | Follow document instructions, open URLs, forward files | File ID, extracted fields, blocked instruction | Operations owner |
| Vendor email | External vendor or compromised mailbox | Sometimes | Classify request and urgency | Approve payment, change bank details, share sensitive data | Email ID, sender domain, requested action | Finance lead |
| Web page | Public internet | Rarely for internal actions | Summarize with citation | Execute page instructions or copy hidden text into actions | URL, fetch time, output category | Workflow owner |
The table is deliberately plain. It forces a sentence like this:
"A transaction description may help summarize spending, but it may not cause the assistant to generate a reauthentication link."
That sentence is much more enforceable than "be safe."
The same shape shows up outside banking
The Blue41 article is about financial services, but the pattern is already sitting inside ordinary business systems.
A customer support team may feed ticket bodies into a drafting assistant. A property manager may feed lease PDFs into a summarizer. A finance team may feed vendor emails into an AP workflow. A sales team may feed CRM notes into an account assistant. A marketing team may feed public webpages into a research assistant.
Each source has a different trust level.
A signed internal policy is not the same as a customer message. A merchant-supplied payment reference is not the same as a bank-generated transaction ID. A vendor email is not the same as an approved supplier record. A scraped webpage is not the same as your own product catalog.
When those fields reach an LLM, the model does not automatically preserve those distinctions. The application has to preserve them.
That is why the action after reading matters so much. A low-risk summary can tolerate more messy context. A system that can send emails, create tickets, request credentials, update records, approve refunds, or initiate payment changes needs tighter field rules.
This is the same boundary problem behind our support bot credential reset guidance. A bot can gather facts and prepare a handoff. That does not mean it should complete the reset.
Guardrails need a narrower job
Blue41's mitigation section is practical: minimize unnecessary context, treat retrieved data as untrusted, constrain sensitive outputs and actions, and monitor runtime behavior.
The OWASP LLM Prompt Injection Prevention Cheat Sheet points in the same direction with structured prompts, clear separation between instructions and data, output monitoring, human-in-the-loop controls, least privilege, and comprehensive monitoring.
The common thread is specificity.
A broad assistant forces the guardrail to decide too much at runtime. It sees messy text, retrieves many records, and can produce many kinds of output. The control has to infer intent from a huge surface area.
A narrow workflow gives the control something concrete to enforce:
- Transaction descriptions can be quoted or summarized, but cannot produce links.
- Support tickets can produce draft replies, but cannot reset credentials.
- Vendor emails can classify urgency, but cannot change bank details.
- Uploaded PDFs can populate structured fields, but cannot pass hidden instructions into tool calls.
- Public web pages can be cited, but cannot instruct the agent to execute a downstream action.
That is where an AI approval queue becomes useful. It is not just a person clicking yes or no. It is the review surface where the system shows which fields were read, what trust level they had, which action was proposed, and which rule triggered the stop.
The queue is stronger when the field rules are already clear.
RAG and fine tuning do not solve the memo problem
Retrieval augmented generation can make an assistant more useful. Fine tuning can shape style and domain behavior. Neither changes the fact that retrieved text can carry instructions.
OWASP's LLM01 guidance says RAG and fine tuning do not fully mitigate prompt injection vulnerabilities. The Blue41 case shows why. The vulnerable moment appears when external content enters the prompt path and the model has to decide how to treat it.
A stronger model may refuse more obvious attacks. A better classifier may catch suspicious wording. A better prompt may reduce the number of failures.
Those layers are worth having. They do not answer the design question: should this field be allowed to affect this action?
If the answer is no, the control should not depend on the model noticing danger. The application should remove the field, mark it as untrusted, transform it into a safer structured value, block sensitive output types, or route the request to review.
For teams evaluating vendors, ask these questions before you ask for benchmark slides:
- Which retrieved fields are treated as untrusted?
- Can customers, vendors, or public sources write any of those fields?
- Are data fields separated from instructions in the prompt structure?
- Which outputs are impossible from untrusted fields?
- Which tool calls require a human decision?
- Can logs show the source fields behind a blocked or escalated answer?
If the answer is only "our model is robust," keep going.
Runtime evidence is the recovery plan
Preventive controls will miss things. Attackers adapt wording. Business context changes. A field that looked harmless in a read-only pilot may become risky once the assistant gets a new action.
Runtime evidence is what lets a team reconstruct the event without guessing.
For a higher-risk assistant, the log should show at least:
- the user request category
- the retrieved source fields
- each field's trust level
- the response category
- blocked links, phrases, or actions
- proposed tool calls
- final action taken
- reviewer decision, if a person reviewed it
That is not surveillance theater. It is incident response for AI-mediated work.
It also connects directly to workflow receipts for agent evals. The final answer is not enough. The system should prove which data it read, which action it proposed, which boundary it respected, and what changed afterward.
For higher-risk workflows, define the agent operating envelope before launch. The envelope should say what the assistant may read, what it may write, what it may recommend, what it may never do, and which exceptions go to review.
The practical test: pick one field this week
The bunq case is easy to file away as a banking story. That would miss the point.
The relevant object is the transaction description: a mundane field that became powerful only because an assistant retrieved it and generated a response from it.
Every company has its own version of that field.
Pick one AI-assisted workflow and choose one external text field inside it. Then write three rules:
- When does the assistant need this field?
- What can the field influence?
- What can the field never influence?
If the field can influence money movement, credential recovery, customer promises, compliance decisions, legal language, medical advice, employment decisions, or sensitive data sharing, add a review path before launch.
BaristaLabs helps teams turn that kind of boundary into practical implementation work: field inventories, approval queues, operating envelopes, and smaller automations that leave a receipt. If you are evaluating an assistant that reads customer records, payment notes, tickets, documents, or emails, start with the field map before the prompt.
For teams that want help turning this into a safe rollout plan, BaristaLabs offers AI consulting and process automation support focused on useful workflows with clear review boundaries.
AI Pilot Readiness Checklist
Turn the idea into a pilot you can defend.
AI agent articles are easy to bookmark and hard to operationalize. Use the readiness questions as a shared way to decide whether a workflow is specific enough, safe enough, and measurable enough to pilot. If they surface a strong candidate, BaristaLabs can review it with you and help shape a first version that fits your systems, approval process, and risk tolerance.
Please do not submit PHI, customer records, credentials, or confidential workflow exports.
Practical AI Workflow Notes
Want more practical AI operations ideas?
Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.
