AI Development

System prompts are not an agent control plane

AI agents need enforcement points before risky tool calls run. System prompts can guide behavior, but refunds, emails, account deletion, and customer work need runtime policy, approvals, logs, and receipts.

Sean McLellan

Lead Architect & Founder

June 3, 20268 min read

A support agent has drafted the apology. The customer is angry, the order was late, and the policy allows a refund in some cases.

Now the agent is about to call:

stripe.refund({
  customerId: "cus_...",
  amount: 27500,
  reason: "service_issue"
})

The system prompt says "be careful with refunds." The support playbook says "protect customer trust." The agent's reasoning may even say this refund is justified.

None of that matters at the moment the callback is about to run.

The question is not whether the agent had good intentions. The question is whether anything stands between the model output and the side effect. Can the system allow the call, block it, require approval, warn a human, log a receipt, or fail closed?

That is where agent governance is getting more concrete.

The recent signal is not another agent launch. It is a set of control-plane ideas moving closer to implementation: Enforra's open-source action governance SDK, Microsoft's Agent Control Specification, Stanford CS336's AI-agent guidelines, and Anish Athalye's AI agent security lecture.

They are different artifacts. One is an SDK. One is a proposed runtime specification. One is a versioned instruction file for a course. One is a security lecture. Read together, they point in the same direction: agent control is moving out of wishful prompts and into runtime checkpoints.

Why prompts are not enough

System prompts are useful. They set tone, scope, and role. They tell an agent which work belongs to it and which work does not.

But prompts are not enforcement.

A prompt can say "never delete a production account." If the agent still emits a tool call like this, the application has to decide what happens next:

admin.deleteAccount({
  accountId: "acct_...",
  environment: "production"
})

If the callback runs automatically, the prompt was guidance, not a control.

That distinction matters for any agent that can touch customer work. Refunds, external emails, CRM updates, ticket routing, account changes, medical intake summaries, vulnerability triage, finance approvals, and data exports all have the same shape. The risky moment is not the paragraph the model writes. It is the action the application executes.

Enforra's README puts the point bluntly: "System prompts are not a security boundary. When an AI agent can issue refunds, run commands, send emails, or export data, the control point should sit before the tool action executes."

That is the operator problem in one sentence.

Microsoft makes a similar argument in its Agent Control Specification, published June 2, 2026. Traditional access control can answer whether a credential is allowed to call a resource. It cannot always answer whether this particular agent call is still safe given what the agent saw earlier in the conversation, what content it retrieved, what tool results it received, or whether attacker-controlled text influenced the path.

That is why "the service account has permission" is not enough either.

A support bot may have permission to send email. That does not mean every generated email should leave the building. A finance agent may have permission to update a vendor record. That does not mean it should update bank details after reading an inbound PDF. A security agent may have permission to open Jira tickets. That does not mean it should file high-severity incidents based on an unverified LLM summary alone.

Prompts help the agent behave. Runtime controls decide what the system allows.

What runtime governance looks like in practice

Runtime governance starts by putting a policy check in the path of the action.

Before the tool callback runs, the application asks a policy layer a practical question:

{
  "tool": "stripe.refund",
  "actor": "support-agent",
  "amount": 27500,
  "currency": "USD",
  "customer_tier": "standard",
  "source_data": ["ticket", "order_history", "agent_summary"],
  "environment": "production"
}

The answer should be boring and explicit:

{
  "verdict": "require_approval",
  "reason": "refund_amount_between_50_and_500",
  "approval_queue": "support-refunds",
  "receipt_id": "agt_2026_06_03_104455"
}

That is the pattern Enforra is packaging. The project describes itself as an "Open source action governance SDK for AI agent tool calls." It is early, small, and Apache-2.0, with 9 GitHub stars at research time, so it should not be treated as a market referendum. The interesting part is the control point it makes concrete.

Enforra evaluates local policies before application-owned tool callbacks run. It returns verdicts such as allow, block, require_approval, or log_only. The local runtime loads local policies, evaluates tool calls before execution, and writes local audit logs. It does not require Enforra Cloud and does not send hosted telemetry by default, according to the README.

The example support policy is exactly the kind of rule small operators understand:

version: 1
defaults:
  decision: block
policies:
  - id: allow-small-refunds
    match:
      agent: support-agent
      tool: stripe.refund
    conditions:
      - field: args.amount
        operator: lte
        value: 50
    decision: allow

  - id: approve-medium-refunds
    match:
      agent: support-agent
      tool: stripe.refund
    conditions:
      all:
        - field: args.amount
          operator: gt
          value: 50
        - field: args.amount
          operator: lte
          value: 500
    decision: require_approval

  - id: block-large-refunds
    match:
      agent: support-agent
      tool: stripe.refund
    conditions:
      - field: args.amount
        operator: gt
        value: 500
    decision: block

That policy is not trying to solve every agent risk. It is drawing a line around side effects the business already knows are sensitive.

Microsoft's ACS zooms out from the SDK pattern and proposes a broader lifecycle. It defines eight interception points: agent_startup, input, pre_model_call, post_model_call, pre_tool_call, post_tool_call, output, and agent_shutdown.

For most business teams, pre_tool_call is the easiest place to start. It is where intent becomes action.

A useful runtime control can do a few simple things:

allow the call
deny the call
warn or escalate
require approval
redact sensitive content
log what happened
fail closed when the policy service is unavailable

Those verbs are more useful than a 40-page governance memo. They turn agent oversight into an implementation surface.

If a tool call would refund $22, let it through and log it. If it would refund $275, route it to an approval queue. If it would delete a production account, block it. If it would send an external email containing customer data, require a human review. If the policy check fails, do not run the side effect and hope for the best.

That is not anti-agent. It is how agents become safe enough to use outside demos.

It also fits the agent receipt pattern. A receipt should record the work the agent performed, the source data it relied on, the tool calls it requested, the approvals it received, and the final outcome. The control plane decides whether the action can run. The receipt explains what happened after it did, or why it did not.

For customer-facing and internal operations teams, those two artifacts belong together: the approval decision and the audit trail.

Why versioned instructions still matter

Runtime controls do not make instructions obsolete.

Stanford CS336's AI agent guidelines are a good example of policy moving into versioned project context. The file tells agents their primary role is "Teaching Assistant, Not Solution Generator." It says agents should explain concepts, review code generally, ask guiding questions, and suggest sanity checks, tests, and invariants. It also says they should not write code, complete TODOs, run bash commands, edit student repos, or solve assignments.

That is not a runtime enforcement layer. A determined or confused agent may still try to do the wrong thing if the surrounding tool environment lets it.

But the file still matters.

Versioned instructions define the job. They create shared expectations. They give reviewers something concrete to inspect in a pull request. They help operators separate intended behavior from forbidden behavior before they start wiring tools into the agent.

For business teams, the equivalent might be:

agent_role: "support triage assistant"

allowed:
  - summarize customer tickets
  - draft replies for review
  - classify refund eligibility
  - suggest internal next steps

not_allowed:
  - issue refunds without policy check
  - send external emails without approval
  - delete accounts
  - change billing details
  - export customer lists

required_receipt_fields:
  - ticket_id
  - source_data_labels
  - proposed_action
  - policy_verdict
  - approver_id
  - final_tool_result

That file will not stop a tool call by itself. It gives the runtime policy something to reflect.

A good control plane needs both layers. Instructions say what the agent is for. Runtime policy decides whether a specific action is allowed right now. Receipts preserve the evidence.

This is also where Athalye's AI agent security lecture is useful. The lecture defines agents as systems that perceive an environment, make decisions, and take autonomous actions. It also notes that agents often operate with high privilege and are not robust even under natural inputs. The threat model includes indirect prompt injection, where malicious content in the environment influences the agent.

That risk is not theoretical for operations work. Customer emails, web pages, PDFs, tickets, CRM notes, Slack threads, and scraped docs can all become part of the agent's environment. Some of that content is untrusted. Some of it may be malicious. Some of it may simply be wrong.

Heuristic defenses can reduce noise, but they do not provide guarantees. More principled defenses try to rule out classes of attacks by controlling how untrusted data can influence control flow and data flow.

For a small team, that may sound academic. In practice, it starts with a modest rule: do not let untrusted content directly authorize risky side effects.

Label the source data. Treat "customer email says refund me $500" differently from "order system confirms duplicate charge." Let the LLM enrich the case, but do not let its prose become the approval.

That same pattern shows up in the CVE AI Agent project from the discovery run. Its README describes a workflow that fetches deterministic security data from NVD, CISA KEV, and EPSS, evaluates thresholds, then uses LLMs only for explicitly marked qualitative sections. If the LLM is unavailable, it generates a high-visibility failure notification and a pure static report for audit transparency.

That is a sane operator pattern: deterministic facts first, LLM enrichment second, visible failure when enrichment cannot run.

What small teams can do this week

You do not need to buy a full enterprise governance suite before improving agent safety.

Start with the actions that would make you nervous if they happened at 2:00 a.m. without a human watching.

For a support team, that may be refunds, account deletion, outbound email, plan changes, and data exports. For IT, it may be user provisioning, password resets, device wipe commands, and privilege changes. For finance, it may be vendor edits, invoice approval, payment release, and bank detail changes.

Then put a policy check immediately before those callbacks.

A simple first version can be a plain function:

async function guardedToolCall(toolName, payload, context, runTool) {
  const decision = await policy.evaluate({
    tool: toolName,
    payload,
    actor: context.agentName,
    user: context.userId,
    environment: context.environment,
    source_data: context.sourceDataLabels
  })

  await audit.write({
    timestamp: new Date().toISOString(),
    tool: toolName,
    payload_summary: summarizePayload(payload),
    verdict: decision.verdict,
    reason: decision.reason,
    receipt_id: context.receiptId
  })

  if (decision.verdict === "block") {
    throw new Error(`Blocked by policy: ${decision.reason}`)
  }

  if (decision.verdict === "require_approval") {
    return approvalQueue.create({
      tool: toolName,
      payload,
      reason: decision.reason,
      receipt_id: context.receiptId
    })
  }

  return runTool(payload)
}

This is not glamorous architecture. It is a gate.

The first checklist can be short:

List every tool your agent can call.
Mark each tool as low, medium, or high risk.
Add a pre_tool_call policy check before medium and high risk tools.
Define allow, require_approval, and block outcomes for obvious cases.
Log every decision to audit JSONL or your existing event store.
Send approval requests to the place operators already work.
Include the policy verdict, source data labels, tool result, and approver in the agent receipt.
Fail closed if the policy check cannot run.

Do not start with abstract principles. Start with thresholds.

Refunds under $50 can run automatically. Refunds from $50 to $500 need approval. Refunds over $500 are blocked until a manager handles them manually. External email requires approval if it includes customer-specific data. Production account deletion is blocked. Internal ticket comments can be allowed and logged.

Those rules will change. That is fine. Put them in version control so the change is visible.

The business value is not only security. It is operational confidence.

A support lead can review the approval queue and see which decisions slow the team down. A security reviewer can inspect the audit log after a strange incident. A business owner can let the agent handle low-risk work without giving it a blank check. An engineer can add a new tool without inventing governance from scratch.

This is where BaristaLabs' existing approval work fits. If you are already writing an AI approval policy before choosing an agent, the next step is to wire that policy into the runtime path. If you are already using an approval queue, keep approvals close to the risky side effect, not buried in a weekly review. If you are already thinking about agent receipts for customer work, make the policy verdict part of the receipt.

We made the same argument in the context of enterprise IT agents and ITBench-AA receipts: the work needs evidence. Runtime governance adds the missing before-the-action control.

The practical takeaway is simple. Do not ask a system prompt to do the job of an enforcement layer.

Use prompts and project instructions to define the agent's role. Use runtime policy to control tool calls. Use approvals for judgment-heavy or risky side effects. Use receipts so humans can reconstruct what happened later.

That is how small teams can move from "the agent was told to be careful" to "the system would not let it do that without approval."

If your team is piloting agents that touch customer, finance, support, or IT workflows, BaristaLabs can help turn that into a working control path through AI consulting and process automation. The goal is not a thicker governance document. It is a safer callback.

Review the approval queue pattern

See how proposed actions, reviewers, policy rules, and workflow receipts fit together before AI reaches production work.

Review the approval queue pattern Map an agent control path

Practical AI Workflow Notes

Want more practical AI operations ideas?

Get short notes on applying AI inside real small-business workflows — from document handling and customer follow-up to internal reporting, compliance, and automation guardrails.

Turn this idea into a pilot

Which workflow should go first?

Use the readiness check to compare impact, effort, risk, owner, and next step before booking a call.

3-5 minutes
Deterministic score
No sensitive data

Check workflow readiness

Share this post

Share on X Share on LinkedIn Share on Bluesky

Give the agent a ticket, not the key

June 21, 2026

The handoff note your AI workflow should leave before approval

June 29, 2026

Before an AI agent queries production, build the query leash

June 19, 2026