← Publications · 2026-05-20
Xybern
Xybern Research
2026-05-20
The Authorisation Layer: The Infrastructure AI Agents Are Missing

Every serious software stack has layers. Compute, storage, networking, identity, observability. Each is a horizontal concern that cuts across every application built on top of it. You do not implement TLS per application. You do not build your own DNS resolver per service. You use the layer.

AI agents are being deployed without one of the most critical layers: authorisation.

Not authentication. Not access control on an API wrapper. Not a system prompt with instructions. A proper authorisation layer, one that intercepts every action an agent intends to take, evaluates it against policy, and makes an enforcement decision before the action executes.

This is not a feature. It is the missing infrastructure primitive of the current AI agent stack. And without it, enterprises are deploying agents that are, in the most literal sense, ungoverned.

Xybern is the authorisation layer for enterprise AI agents. This piece defines what that means, why it matters, and what the architecture looks like in practice.

What Agents Actually Do

Before you can govern agents, you need to be precise about what they do. An agent is not a chatbot. It is not a function that takes input and returns text. An agent is an autonomous process that:

Takes actions in external systems. APIs, databases, file systems, email platforms, ERP systems, trading infrastructure. Real systems with real consequences.

Makes decisions based on model reasoning. The model decides what to do next. That decision is conditioned on a prompt, a context window, tool outputs, and training data the operator does not fully control.

Operates over extended time horizons. A single agent run can involve dozens of sequential actions, each one building on the results of the last.

Chains with other agents. Multi-agent pipelines are the standard architecture for complex workflows. One agent's output becomes another agent's input and action.

Operates at machine speed. An agent does not pause between actions the way a human does. It executes as fast as the infrastructure allows.

This combination is what makes ungoverned agents dangerous. The blast radius of an agent error is not bounded by human reaction time. A misconfigured finance agent can execute hundreds of transactions before anyone notices. A poorly governed legal agent can file documents, send correspondence, or trigger contractual obligations that are difficult or impossible to reverse. A compromised customer-facing agent can exfiltrate data across thousands of interactions.

The question is not whether AI agents should be governed. It is where in the stack governance lives.

The Four Failures

When enterprise teams today try to govern their agents, they typically reach for one of four approaches. All four are insufficient.

Failure 1: System Prompt Instructions

"Do not take actions worth more than $10,000." "Always ask for confirmation before sending emails."

System prompt instructions are not enforceable. They are suggestions to the model. A sufficiently complex context window, an adversarial input, a jailbreak attempt, or a model update can cause the model to violate the instruction. You cannot audit adherence to a system prompt. You cannot demonstrate to a regulator that an agent followed a rule embedded in natural language. Prompts are not policy.

This is not a criticism of prompt engineering as a discipline. It is a statement about what prompts are structurally capable of. Governance requires deterministic enforcement. A language model is not a deterministic enforcement engine.

Failure 2: Application-Level Guards

Developers add conditional logic around action calls. if amount > 10000: raise Exception. This works until the codebase grows, until a developer forgets to add the guard on a new code path, until the business rule changes and the guard is not updated everywhere it appears, or until the agent takes an action the developer did not anticipate when writing the guard.

Application-level guards do not compose. They are not centralised. They cannot be audited as a whole. Changing a business rule requires a deployment. There is no single place to ask "what are the current governance rules for this agent?" because the rules are distributed across the codebase.

Failure 3: Traditional IAM

AWS IAM, Azure AD, Okta, and equivalent systems are designed for human identities and service accounts. They answer the question: is this principal allowed to call this API endpoint?

They do not answer: should this agent be allowed to send this specific email, with this specific content, to this specific recipient, at this specific time, given what it has already done in this session, and given that the amount involved exceeds the threshold defined in the policy that applies to agents with this trust score?

IAM operates at the permission level. Agents need governance at the action level. The agent has permission to call the email API. The governance question is whether it should, given the full context of this specific action.

Failure 4: Post-Hoc Logging

Logging what agents did after the fact is observability, not governance. An audit trail tells you what went wrong. An authorisation layer prevents it from happening. These are not substitutes for each other.

Post-hoc logging is necessary. It is not sufficient. Regulated enterprises cannot tell a compliance officer that they discovered the problem from the logs. The question regulators ask is not "what happened?" It is "what controls prevented the wrong thing from happening?"

The Authorisation Layer Pattern

What is actually needed is a dedicated authorisation layer: a horizontal infrastructure component that sits between AI agents and the systems they act on, intercepts every intended action, evaluates it in real time against active policies, and returns an enforcement decision before the action executes.

This is not a new concept in systems architecture. Policy enforcement is well understood in human IAM, in network security (firewalls, WAFs, zero-trust proxies), and in data governance (column-level access control, DLP). What is new is applying it to AI agent actions at runtime, with the speed, context-awareness, and policy expressiveness that agents require.

The authorisation layer has four responsibilities:

Intercept. Every agent action, before execution, is submitted to the authorisation layer. Not a sample. Not high-risk actions only. Every action. The enforcement call happens in the agent's hot path, before the downstream system is touched.

Evaluate. The layer evaluates the action against the full set of active policies. Policies express rules based on action type, content, metadata, agent identity, time, chain context, and trust score. Evaluation is deterministic, consistent, and auditable.

Decide. The layer returns one of three decisions: allow the action to proceed, block the action and return a reason the agent can handle, or escalate the action for human review and pause the agent until a human resolves it.

Record. Every decision is written to an immutable audit log before it is returned to the agent. The record includes the full action context, the policies evaluated, the decision, the reasoning, and a trust score. This record cannot be altered after the fact.

The Architecture

A concrete authorisation layer for AI agents looks like this:

 Agent Code
     │
     │  enforce(action)         ← SDK / auto-capture layer
     ▼
 Authorisation Layer
     │
     ├── 1. Identity resolution   (agent_id, role, trust score)
     ├── 2. Policy evaluation     (all active policies)
     ├── 3. Decision generation   (allow / block / escalate)
     ├── 4. Vault write           (immutable record, before response)
     │
     ▼
  Decision returned to agent
     │
     ├── ALLOW    → agent proceeds, action executes
     ├── BLOCK    → agent stops, reason logged, error handled
     └── ESCALATE → agent pauses, human reviews, agent resumes or stops

The key architectural properties:

The layer is a separate service, not agent code. Policies are centralised. One change applies everywhere, immediately, without a deployment.

The audit log is independent of the agent. Even if the agent code is replaced or compromised, the record of what was attempted is intact.

Enforcement is consistent across agents. Whether you have one agent or a thousand, running on one machine or across a distributed system, the same policies apply.

The layer composes across frameworks. CrewAI agents, LangGraph workflows, AutoGen systems, custom pipelines, all governed by the same layer without rebuilding enforcement for each.

Policy Expressiveness

The quality of an authorisation layer is determined by how expressive its policy language is. A layer that can only block specific action types is not sufficient for enterprise AI governance.

Production-grade authorisation requires policies that express:

Content rules. Block any action where the email body contains personal financial identifiers. Flag any prompt where the content matches known exfiltration patterns.

Metadata rules. Block any transaction where amount > 100000 and recipient_jurisdiction == "sanctioned". Escalate any action where data_classification == "confidential" and the agent's trust score is below 70.

Temporal rules. Allow database writes only between 09:00 and 17:00 on weekdays. Auto-expire all permissions for this agent after 30 minutes. Do not allow this agent to retry a blocked action more than twice in one hour.

Identity and role rules. This agent role has payments.read but not payments.execute. This agent has been granted temporary elevated permissions by a human operator, valid for 20 minutes, for this specific task.

Chain and session rules. If this agent has already sent an external communication in this session, escalate the next one for review. If the trust score has fallen below 60 across the last five decisions, require human approval for all subsequent actions.

Delegation rules. Agent A may delegate data.read to Agent B for this task, but not data.write. The delegation expires at session end. Agent B cannot further delegate.

This is the policy surface that enterprise AI governance actually requires. It is far beyond what application-level conditionals or system prompts can express, and it cannot be assembled from traditional IAM primitives.

Human-in-the-Loop as a Governance Primitive

Escalation is not failure. It is a deliberate governance mechanism.

For certain actions, the correct answer is neither allow nor block but escalate. A finance agent about to execute a large cross-border transfer. A legal agent about to file a regulatory document. An operations agent about to change production infrastructure. These are situations where human judgement adds value and where the cost of an autonomous wrong decision exceeds the cost of a brief pause for human review.

The authorisation layer handles escalation as a first-class workflow:

  1. The agent submits the action and receives an escalate decision with a decision ID.
  2. The action is paused. The agent waits, polling for a resolution.
  3. A human operator reviews the action in a dedicated interface, with full context: what the agent is trying to do, why, what policies triggered escalation, the agent's recent decision history, and the trust score.
  4. The operator approves or rejects. The decision is recorded immediately.
  5. The agent resumes on approval or terminates on rejection.

The escalation loop is the bridge between fully autonomous agents and fully manual processes. It allows enterprises to deploy agents with confidence into high-stakes workflows, because the layer guarantees that actions above defined risk thresholds will surface to humans before executing.

Enterprise Implications

For engineering and security teams, the authorisation layer is infrastructure. For legal, compliance, and risk teams, it is the mechanism that makes AI agent deployment defensible.

Regulatory alignment. GDPR, SOX, HIPAA, DORA, the EU AI Act, and emerging AI-specific regulations share a common requirement: organisations must demonstrate meaningful control over automated decision-making systems. A centralised authorisation layer with an immutable audit log is the mechanism that makes this demonstration possible. "Every action was evaluated against defined policy before execution, and the full record is here" is a statement a compliance officer can work with.

Liability containment. When an AI agent causes harm, the first question from legal counsel and regulators is what controls were in place. Application-level guards distributed across a codebase are difficult to inventory and impossible to demonstrate comprehensively. A centralised policy engine with a complete decision log is not.

Operational confidence. Engineering teams can ship new agent capabilities faster when governance is a layer, not a per-agent implementation concern. You define the policy once. It applies to every agent that calls the enforcement endpoint. New agents inherit the governance posture of the organisation without requiring per-agent policy work.

Security boundary. AI agents are attack surfaces. Prompt injection attacks attempt to make agents take actions their operators did not intend. An authorisation layer that evaluates the action itself, not just the intent, provides a security boundary that prompt-level defences cannot. Even if the model is manipulated into attempting a malicious action, the layer blocks it before it executes.

Audit readiness. A compliance team asking what your agents did last quarter, and why, needs an answer they can provide in hours, not weeks. The authorisation layer's immutable decision log is the answer. Every action, every decision, every policy evaluated, every escalation, every resolution, retrievable and verifiable.

Breakglass: Governance for Emergencies

In production systems, there will be situations where the correct response is a controlled, temporary override of normal governance. A trading system needs to execute a time-sensitive action outside normal hours. An incident response agent needs elevated permissions to remediate a live security event. A senior operator needs to unblock an agent immediately during a critical workflow.

The authorisation layer handles this through a breakglass protocol: a human-initiated, time-limited override with mandatory justification, immediate audit recording, automatic expiry, and post-incident review requirement. The override does not disable governance. It creates a fully logged exception, visible in the audit trail, attributable to the human who initiated it.

Breakglass without an immutable audit trail is not governance. It is an undocumented gap. The authorisation layer closes it.

What Xybern Implements

Xybern is the authorisation layer for enterprise AI agents. Every agent action is enforced, audited, and governed before it executes.

The enforcement API intercepts actions via POST /v1/enforce/intercept, evaluates them against active policies in real time, and returns a decision with trust score and reasoning in under 10 milliseconds on the fast path. The SDK integrates with CrewAI, LangGraph, AutoGen, LlamaIndex, and custom pipelines. Auto-capture intercepts tool calls at the framework level without requiring manual enforcement calls at each action site.

The policy engine supports content, metadata, temporal, identity, chain, and delegation rules. The Policy-as-Code SDK allows engineering teams to define policies as version-controlled Python, deployed atomically with full provenance tracking. Shadow mode evaluates new policies against live traffic without enforcing them, eliminating the risk of false positives before a policy goes live.

The LLM Gateway sits in front of model providers and enforces policy on every prompt and completion, governing not just what agents do but what models receive and return.

The Provenance Vault records every enforcement decision before it is returned to the agent. The record is immutable. The Sentinel dashboard gives security teams a real-time view of every decision, every escalation, every policy evaluation. The human-in-the-loop interface closes the escalation loop in seconds.

The Infrastructure Question

The authorisation layer is horizontal infrastructure. Like a firewall, a service mesh, or an identity provider, it is a component that every team in an organisation benefits from without building it themselves.

The question for engineering and security leaders is not whether AI agents need governance. It is where governance lives in the stack.

If the answer is distributed across agent code, implemented differently per team, expressed as system prompt instructions and application-level conditionals, the organisation is accumulating governance debt at the rate it deploys agents. Every new agent is a new surface without a consistent enforcement posture. Every policy change is a deployment. Every audit is a reconstruction.

If the answer is a centralised authorisation layer, with a policy engine, an immutable audit log, and a human review interface, the organisation has infrastructure that scales with its AI adoption. New agents inherit governance. Policy changes are immediate. Audits are retrievable.

That infrastructure exists. It is the authorisation layer. And it is the piece that is missing from every AI agent stack deployed without one.

Xybern is the authorisation layer for enterprise AI agents. Every agent action is enforced, audited, and governed before it executes. Learn more at xybern.com or read the technical documentation at docs.xybern.com.

Share

Link copied!

Want more insights?

We publish regularly.

Stay updated with the latest research on verified AI reasoning.

More Publications Request a pilot