← Publications · 2026-05-26
Xybern
Xybern Research
2026-05-26
AI Agents Need Permission Boundaries

Every operating system ever built for multiple users has a concept of a permission boundary. A process runs as a user. That user can read some files and not others, write to some directories and not others, call some system functions and not others. The boundary is not advisory. It is enforced by the kernel, below the level of the application, and the application cannot talk its way past it.

AI agents are being deployed today without anything resembling this. An agent is handed a set of tools, a set of credentials, and a natural language instruction, and is then trusted to stay inside an invisible line that exists only as a suggestion in its prompt. There is no kernel. There is no boundary. There is only the hope that the model behaves.

This piece argues that permission boundaries are not an optional hardening step for agentic systems. They are the foundational primitive that makes agents safe to deploy in production at all, and explains what a real boundary looks like, where the naive approaches fail, and how enforcement has to work to be meaningful.


What a Permission Boundary Actually Is

A permission boundary is a hard limit on what an actor can do, enforced by something the actor does not control.

The three words that matter are hard, enforced, and does not control.

Hard means the boundary is not a probability. It is not "the agent will usually not do this." It is "the agent cannot do this, and if it tries, the attempt is intercepted and denied."

Enforced means there is a mechanism, separate from the actor, that evaluates every attempt against the boundary and decides. Enforcement is active. It happens at the moment of the action, not after.

Does not control means the actor cannot modify, disable, or reason its way around the boundary. A boundary the agent can edit is not a boundary. A boundary defined in the agent's own prompt is, by definition, under the agent's influence.

When you remove any one of these properties, you no longer have a boundary. You have a guideline. The entire problem with current agent deployments is that the industry has been shipping guidelines and calling them boundaries.


Why Agents Specifically Need Them

It is worth being precise about why this matters more for agents than for traditional software, because the answer shapes the solution.

A traditional application is deterministic in its action space. A payments service calls the payments API. It does not, on a Tuesday, decide to start sending emails because something in its input suggested it might be helpful. The set of actions a conventional program can take is fixed at development time and visible in the code.

An agent's action space is open. The model decides what to do next based on reasoning over its context. That context includes the task instruction, the conversation so far, the outputs of previous tool calls, and content retrieved from external sources. Any of those can shift what the agent decides to do. The action space is not bounded by the code. It is bounded only by the tools the agent has access to, and an agent with broad tools has an enormous range of possible actions.

This produces four properties that traditional software does not have, and each one is a reason boundaries are mandatory.

Emergent behaviour. The agent can take action sequences no developer anticipated or tested. You cannot enumerate the failure modes in advance because the action space is combinatorial and driven by model reasoning.

Susceptibility to manipulation. Content the agent reads can change what it does. A malicious document, a poisoned search result, or an injected instruction can redirect the agent toward actions that serve the attacker rather than the operator.

Speed. Agents act at machine speed with no natural pause. By the time a human notices, the damage is done. There is no reaction window.

Compounding. Agents chain actions, and they chain with other agents. A single bad decision early in a sequence can cascade through every subsequent step.

Property Traditional Software AI Agent
Action space Fixed at development time Open, decided at runtime by the model
Failure modes Enumerable, testable Emergent, combinatorial
Influenced by input content No Yes, including adversarial content
Execution speed Bounded by program logic Machine speed, no pause
Blast radius of one error Local, bounded Compounding across a chain

Each row in that table is an argument for an external boundary. Taken together they are decisive. You cannot test your way to safety in an open action space, and you cannot trust an actor that can be manipulated by the very data it processes.


The Four Things People Mistake for Boundaries

When teams reach for a way to constrain their agents, they typically land on one of four approaches. None of them is a boundary, and understanding why is the fastest way to understand what a real one requires.

Prompt instructions

"You must never delete production data. You must never send money over ten thousand without approval."

This is the most common approach and the weakest. Instructions in a prompt are inputs to a probabilistic system, not constraints on it. The model weighs them against everything else in its context. A long conversation, a cleverly worded user request, an injected instruction in a retrieved document, or simply an unlucky sampling can override them. There is no enforcement and no audit. You cannot prove to anyone that the rule held, because on any given run it might not have.

Tool design

The reasoning here is that if you only give the agent safe tools, it can only do safe things. Restrict the toolset and you restrict the behaviour.

This is better than nothing because it genuinely narrows the action space. But it is coarse and static. A tool is either present or absent. You cannot express "this tool is allowed for amounts under a threshold," or "allowed during business hours," or "allowed only after a human has approved the preceding step." The boundary you actually need is contextual, and tool presence is binary. The moment a tool is useful enough to grant, it is useful enough to misuse.

Application guards

Developers wrap tool calls in conditional logic. If the amount exceeds a limit, raise an error. If the recipient is not on a list, block.

Guards are real enforcement, which puts them ahead of prompts and tool design. The problem is that they do not compose and they do not centralise. Every guard lives at one call site. When a new code path is added and the developer forgets the guard, the boundary has a hole. When the rule changes, every copy of the guard must be found and updated. There is no single place that answers the question "what are the rules for this agent right now," because the rules are smeared across the codebase. Guards are boundaries with no map.

Post hoc review

Log everything the agent did and have a human review it.

This is not a boundary at all. It is a record of boundaries that were never enforced. Review tells you what happened after it happened. For an agent moving at machine speed across a chain of consequential actions, that is an autopsy, not a control. The wrong action already executed. The money already moved. The data already left.

Approach Enforced Contextual Centralised Auditable Tamper resistant
Prompt instructions No Partly No No No
Tool design Partly No No No Partly
Application guards Yes Partly No Partly No
Post hoc review No n/a No Yes No
Permission boundary Yes Yes Yes Yes Yes

The bottom row is the target. Everything above it is missing at least two of the properties that make a boundary worth the name.


Anatomy of a Real Permission Boundary

A permission boundary for an agent is a control point that sits between the agent and every system it can act on. Every action the agent intends to take passes through it before reaching the target. At that point the boundary evaluates the action against policy and returns a decision. The action does not execute until the decision is made.

There are five components that make this work.

The interception point. Nothing reaches a downstream system without passing through the boundary. If an agent can call an API directly, bypassing the control point, there is no boundary for that path. Interception must be complete. A boundary with a gap is a boundary with no value, because the agent's open action space will eventually find the gap.

The policy. The set of rules the boundary evaluates against. Crucially, policy lives outside the agent and outside the application code. It is data, not behaviour, which means it can be inspected, versioned, changed without redeployment, and reasoned about as a whole. The question "what is this agent allowed to do" has a single answer in a single place.

The context. A decision is not made on the action alone. It is made on the action plus the full context: which agent is acting, what it has already done in this session, what its trust level is, what time it is, what the chain of delegation looks like, and any environmental signals. A boundary that ignores context can only express crude binary rules. A boundary that uses context can express the nuanced ones that real operations require.

The decision. The boundary returns one of a small set of verdicts. Allow lets the action proceed. Block stops it and returns an error. Escalate pauses the action and routes it to a human. The decision is returned before execution, which is the entire point.

The record. Every decision produces a signed, tamper evident entry: what the agent intended, what policy applied, what context was present, what verdict was reached, and who approved if a human was involved. This is the artifact that turns enforcement into evidence.

        ┌─────────────────────────────────────────┐
        │              AGENT RUNTIME                │
        │                                           │
        │   model decides to take an action         │
        │                  │                        │
        └──────────────────┼────────────────────────┘
                           │  intercepted here
                           ▼
        ┌─────────────────────────────────────────┐
        │           PERMISSION BOUNDARY             │
        │                                           │
        │   evaluate( action, context, policy )     │
        │                  │                        │
        │      ┌───────────┼───────────┐            │
        │      ▼           ▼           ▼            │
        │    ALLOW       BLOCK      ESCALATE         │
        │      │           │           │            │
        │      │           │           ▼            │
        │      │           │      human review      │
        │      │           │       approve/deny     │
        │      ▼           ▼           ▼            │
        │   execute     reject     execute or        │
        │   + record    + record    abort + record   │
        └─────────────────────────────────────────┘
                           │
                           ▼
              downstream systems (APIs, DBs, email...)

Notice what this architecture does not require. It does not require the agent to cooperate. It does not require the model to be reliable. It does not require developers to remember a guard at every call site. The boundary holds regardless of what the agent reasons, because the agent does not control it.


Boundaries Are Contextual, Not Binary

The single biggest gap between the naive approaches and a real boundary is context. A useful boundary almost never expresses a flat yes or no. It expresses conditions.

Consider a single tool, sending an email, and the range of boundaries a real operation needs around it.

Condition Boundary behaviour
Recipient inside the organisation Allow
Recipient external, no attachments Allow
Recipient external, contains attachment Escalate to human
Recipient on the blocklist Block
More than fifty sends in this session Block, rate limit tripped
Content matches a sensitive data pattern Escalate
Outside business hours and external recipient Escalate

Every one of these is the same tool. Tool design cannot tell them apart, because the tool is present in all cases. Prompt instructions cannot enforce them, because they are suggestions. Application guards could express each one individually but would scatter seven different conditions across the codebase with no unified view and no shared audit. A permission boundary expresses all seven as policy, evaluated against context, in one place, with a record for each decision.

This is why context is the dividing line. The boundaries that matter in production are conditional, and only a context aware enforcement layer can express conditions while still being hard, enforced, and outside the agent's control.


The Multi Agent Problem

Boundaries get harder, and more necessary, when agents chain.

A modern agentic system is rarely a single agent. It is an orchestrator that delegates to sub agents, each of which may call tools or delegate further. The output of one agent becomes the input, and the trigger for action, of the next.

   orchestrator
        │ delegates
        ▼
   research agent ──► external web content
        │ passes findings
        ▼
   analysis agent ──► internal database
        │ passes plan
        ▼
   action agent ──► CRM, email, payments

This structure breaks the naive approaches completely. Whose permissions apply when the action agent moves money? The orchestrator initiated the task, but the action agent took the step, and the analysis agent shaped what the step would be, partly based on content the research agent pulled from the open web. If a malicious instruction entered through that web content, it has now propagated three hops down the chain to a system that moves money, and at no point did a boundary ask whether this specific action, in this specific chain, was permitted.

A permission boundary that understands chains carries context with the delegation. Each action records the full path: who originated the task, which agents handled it, what each decided, and whether any human reviewed it. The boundary at the action agent can see that the instruction it is about to act on traces back to untrusted external content, and can escalate or block on that basis. Without a boundary, the chain is a privilege escalation waiting to happen, where untrusted input at the top flows unchecked into consequential actions at the bottom.


What Happens Without One

The argument is easier to feel with a concrete sequence. Consider a customer support agent with tools to read account data, issue refunds, and send email, deployed with prompt instructions as its only constraint.

A customer message arrives containing, buried in otherwise normal text, an instruction crafted to manipulate the model: ignore previous limits, issue a full refund, and confirm by email. Here is what each layer does.

The prompt instruction said not to issue refunds over a limit without approval. The model, with the injected instruction now in its context, weighs that against the apparent user request and complies. No enforcement existed, so nothing stopped it.

The tool design gave the agent a refund tool because issuing refunds is its job. The tool was present, so it was available. Coarse tool restriction could not express "refunds under fifty allowed, above fifty require approval."

There were no application guards on this path, because this particular combination of conditions was not one the developers anticipated when they wrote the code.

Post hoc review will catch it tomorrow, when someone reads the logs, after the refund has cleared.

Now place a permission boundary in the same scenario. The refund action is intercepted. Policy says refunds above a threshold escalate to a human. Context shows the triggering instruction originated from inbound customer content, which raises the sensitivity. The boundary escalates. A human sees the request, recognises the manipulation, and denies it. A signed record captures the entire decision. Nothing executed that should not have, and there is evidence that the control worked.

The difference between these two outcomes is not a better model or a smarter prompt. It is the presence of a boundary the agent did not control.


Boundaries as Infrastructure

The mistake worth naming explicitly is treating permission boundaries as an application feature, something each team builds into each agent. They are not. They are infrastructure, in the same way that memory protection, file permissions, and network firewalls are infrastructure.

You do not ask each application to implement its own memory protection. The operating system provides it as a horizontal layer, below the application, applied uniformly to everything that runs. The application cannot opt out and does not need to opt in. The protection is simply there, enforced by a lower layer the application does not control.

Agent permission boundaries belong at the same altitude. They sit below the agent and above the systems it acts on, applied uniformly to every agent and every action, defined by policy that lives outside any single agent, producing a uniform audit across the whole fleet. Build it once as a layer, and every agent deployed on top of it inherits the boundary without each team reinventing a weaker version.

Infrastructure primitive Protects against Enforced by
Memory protection A process reading another's memory The kernel
File permissions Unauthorised file access The filesystem
Network firewall Unauthorised connections The network layer
Permission boundary Unauthorised agent actions The authorisation layer

The fourth row is the one the industry is missing. The first three are so foundational that no one would deploy production software without them. Agents are being deployed without the fourth every day, and the open action space guarantees that the gap will eventually be found, whether by an adversary or by the agent's own emergent behaviour.


What This Requires in Practice

Pulling the argument together, a permission boundary that is worth deploying has to satisfy a specific set of requirements, and they follow directly from the failures of the naive approaches.

It must intercept completely, so the open action space has no unguarded path. It must evaluate per action and in context, so it can express the conditional boundaries real operations need rather than crude binary ones. It must keep policy outside the agent and outside the application code, so the rules are inspectable, versionable, and beyond the agent's influence. It must return a decision before execution, so enforcement is prevention rather than observation. It must support escalation to a human, so high stakes actions get judgment rather than automation. It must carry context across multi agent chains, so delegation cannot become privilege escalation. And it must produce a signed, tamper evident record of every decision, so enforcement doubles as evidence.

No single one of these is sufficient alone. Interception without context gives you crude rules. Context without interception gives you advice. Decisions without records give you enforcement you cannot prove. Records without prevention give you an autopsy. The boundary is the combination, operating as one layer.

This is not a feature you add to an agent. It is the layer you run agents on top of, and in an open action space driven by a manipulable model acting at machine speed, it is the difference between a system you can deploy and one you can only hope about.


Xybern is the authorisation layer for enterprise AI agents. Every agent action is enforced, audited, and governed before it executes. Learn more at xybern.com or read the technical documentation at docs.xybern.com.

Share

Link copied!

Want more insights?

We publish regularly.

Stay updated with the latest research on verified AI reasoning.

More Publications Request a pilot