Layer Three: AI as Subject
Stacked Zero Trust: Post 6 of 13
This is the point the series has been building towards.
Post 2 argued that agentic AI is a new kind of subject, one that breaks several of the assumptions the trust algorithm quietly relies on. Post 3 placed that subject inside the architecture as layer three, while the intervening posts dealt with the substrate beneath it and the mediator above it.
Now we get to the subject itself.
The central idea is easy enough to state. Treat the autonomous agent as a first-class subject of the trust algorithm. Not as a feature of an application, not as an unusually active service account, and not as a workload with eccentric behaviour. Treat it as its own kind of principal, making requests, receiving authority, taking actions, and requiring governance in exactly the same structural sense as any other subject.
Stated like that, the idea sounds almost ordinary. Most architectural changes do when they’re reduced to a sentence or two.
The difficulty appears when you try to honour that decision in practice, because nearly every tool the substrate uses to govern subjects was built around properties that agents don’t reliably possess. The further you follow the implication, the more things that once felt settled begin to move under your feet.
Identity is one example. Posture is another. Least privilege changes shape entirely. Even breach, which feels like one of the more stable concepts in security, starts to mean something slightly different once the subject is capable of reasoning, delegation, and autonomous action.
This post takes those four properties in turn. It doesn’t offer finished answers, because most of those answers don’t exist yet. What it does offer is a clearer description of where the model begins to strain, because understanding the shape of the problem is ultimately more useful than pretending the problem has already been solved.
Identity, when the subject can be talked into things
For a human or a workload, identity is relatively settled territory. There is a credential, an entry in a directory, a means to authenticate, and a means to revoke. Privilege attaches to the identity, and the identity is stable enough that the attachment means something.
Agents make that picture harder surprisingly quickly.
The mechanical part is awkward enough on its own. Agents frequently operate under delegated authority, which means questions of provenance and accountability become difficult once you step beyond the simplest examples. An agent may be acting on behalf of a user while invoking another agent, which then calls a tool or a service on its behalf. By the time an action is taken, it can be far from obvious whose authority is actually being exercised, or where responsibility should sit if something goes wrong.
That is a difficult engineering problem, but it is at least recognisable as one.
The deeper challenge sits elsewhere.
An agent’s effective behaviour is partly shaped by the information it processes. That isn’t unusual in itself; humans are influenced by information too. The consequence is different. A subject can remain correctly authenticated, continue using its legitimate credentials, and still be persuaded into acting against the interests of the identity it carries.
Prompt injection is the clearest example. The credential hasn’t been stolen. The authentication process hasn’t failed. The agent is doing exactly what it has been allowed to do, while being influenced into doing something it shouldn’t.
That leaves us in an uncomfortable position. Our identity systems are built around the idea that an authenticated subject acts either on its own behalf or on behalf of a legitimate principal. Agents introduce a third possibility: an authenticated subject acting on behalf of whoever most recently influenced its reasoning.
The frameworks don’t really have a place for that.
Identity for agents is not just difficult. It is genuinely unsolved, and unsolved enough to deserve a post of its own later in the series.
Posture, when there is no device to inspect
The substrate has a mature idea of posture.
A device reports its patch level, configuration state, compliance status, and a range of other signals that help establish whether it is fit to be trusted. The trust algorithm folds those signals into its decisions because they provide a reasonably reliable picture of the subject’s health.
The difficulty appears as soon as you ask what posture means for an agent.
There is no operating system to inspect, no familiar compliance baseline to measure against, and no straightforward equivalent of device health. The thing you are trying to assess is a reasoning process, which means the question shifts from configuration to behaviour.
You stop asking whether the subject is patched and start asking whether it is operating within its intended scope. You stop asking whether the configuration has drifted and start asking whether the behaviour has drifted. The focus moves away from state and towards conduct.
That is a different kind of assessment entirely.
The nearest equivalents come from model monitoring rather than device management. Is the agent behaving consistently with its stated purpose? Is it operating within expected bounds? Is there evidence that it has been diverted from the task it was originally given?
Those questions are real, and they matter. They are also questions about a process in motion rather than a static condition.
The trust algorithm has never been especially comfortable dealing with posture as a behavioural judgement, particularly when the behaviour itself is probabilistic and continuously changing. Assessing the health of a subject whose health is itself a behavioural question turns out to be a different problem entirely, which is why posture deserves its own treatment later in the series.
Least privilege, when actions cannot be detailed
Least privilege has always depended on being able to predict, at least broadly, what a subject needs in order to do its job.
For humans and workloads, that is often difficult but still achievable. Roles can be defined. Permissions can be scoped. Access can be limited to what a function requires.
Agents complicate that because the very thing that makes them valuable is their ability to work out the steps for themselves.
A request like “review the supplier contracts and flag anything unusual” doesn’t naturally decompose into a fixed list of actions that can be scoped in advance. The agent may need to access three systems or seven. It may need to revisit information it didn’t expect to use. It may discover a path through the task that wasn’t obvious when the instruction was first issued.
The more successful the agent is at reasoning, the less predictable the exact sequence becomes.
That creates a tension at the heart of least privilege. Scope access too tightly and the agent cannot complete the task. Scope it broadly enough to accommodate uncertainty and you end up granting substantial authority to a fast, autonomous, and highly adaptable subject.
That is exactly the situation least privilege was intended to avoid.
The direction that seems to hold up is to stop treating privilege as something fixed at the start of a task and instead treat it as something granted and withdrawn as the task unfolds. In that model, authority follows demonstrated need within an authorised intent and disappears once the need disappears.
That approach is coherent in principle, but still immature in practice. More importantly, it depends almost entirely on the mediator being capable of making those decisions continuously and at machine speed.
None of that makes least privilege impossible.
It does mean that the shape of the problem no longer resembles the static scoping model the substrate is comfortable with, and we’re still working out what the replacement looks like.
Breach, when the credentials were used correctly
Assume breach is one of the foundations of Zero Trust, and for the older subjects it has a reasonably recognisable shape.
Someone steals a credential. Anomalous behaviour appears. A workload starts doing things it has never done before. The details vary, but there is usually some observable sign that a compromise has occurred.
The agentic version is stranger.
When an agent is successfully influenced partway through a task, every action that follows may still be formally legitimate. The credentials remain valid. The authentication remains correct. Every request may appear authorised when viewed individually.
Nothing has been stolen.
Nothing has been broken.
The subject has simply been persuaded.
That distinction matters because many of the signals the substrate relies on are signals of compromise. The agentic failure mode is often a failure of influence rather than compromise, which means the behaviour can remain superficially legitimate even while the outcome becomes increasingly harmful.
In practice, this changes what assume breach means.
For agents, it increasingly becomes the assumption that a correctly authenticated, properly credentialled subject may nonetheless be acting against your interests. The subject may not be hijacked in the traditional sense. It may simply be influenced.
That is a much stranger thing to design against than a stolen password, and it pulls us back towards the loop introduced earlier in the series.
Ultimately, the only realistic prospect of detecting this kind of failure is a mediator capable of observing behaviour rather than simply verifying credentials.
Why this matters now
At this point it would be reasonable to look at all four areas and conclude that layer three is simply too early; that the problems are interesting, but not yet operational.
The difficulty with that conclusion is that the agents are arriving anyway.
Organisations are already introducing autonomous agents into production environments, often because the commercial pressure to realise value from AI arrives long before the governance catches up. In practice, the decision is rarely whether to deploy agents or not. More often, the decision is whether to understand the gaps before deployment or discover them afterwards.
That changes the value of the model.
The purpose of layer three isn’t to provide finished answers. If those answers existed, this would be a considerably shorter series. Its purpose is to make the unresolved visible, because an organisation that can identify where a particular deployment depends on agent identity, behavioural posture, dynamic privilege, or influence-resistant trust is already in a better position than one that assumes those problems were solved elsewhere.
That may not sound dramatic, but it matters.
A surprising amount of the current conversation assumes that agents can simply inherit the governance models built for users and workloads. Sometimes that assumption is explicit. More often it isn’t stated at all. The model continues to work until the point where it doesn’t, and that point usually arrives in production rather than in design.
Seen that way, the value of naming the gaps becomes fairly practical.
It tells you where to be cautious. It gives you better questions for vendors. It tells you which capabilities in the mediator layer matter most, because the mediator is carrying much of the burden that the substrate can no longer carry on its own.
Most importantly, it stops you mistaking an unsolved problem for a solved one.
One thing to take from this
Treating the autonomous agent as a first-class subject sounds straightforward until you follow the implications.
Identity becomes vulnerable to influence in ways the existing model doesn’t understand. Posture turns into a behavioural assessment rather than a device assessment. Least privilege can no longer rely entirely on permissions defined in advance, and breach begins to include situations where the credentials were used correctly all along.
None of those problems is fully solved.
What matters, for now, is knowing where they sit, understanding which of your deployments depend on assumptions that no longer hold, and resisting the temptation to treat unresolved questions as if they were settled simply because the technology is already being deployed.
That closes the central act of the series.
The next three posts move into the hardest problems in their own right, beginning with the one this post could only touch briefly: identity for agents, and why it is genuinely unsolved.
Post 6 of 13 in Stacked Zero Trust.
Previously: Post 5 - Layer Two: AI as Mediator.
Next: Post 7 - Identity for Agents Is Genuinely Unsolved.
The reference document at the end of the series includes the full subject taxonomy, setting out how agents, agentic chains, and model-as-tool subjects differ from humans and workloads across identity, posture, and verification.
References drawn on in this post: NIST Special Publication 800-207, Zero Trust Architecture (August 2020), for the subject, posture, least-privilege, and assume-breach principles; the practice of model monitoring as the nearest analogue for agent posture; Model Context Protocol as an example of agent-to-tool interaction relevant to agent governance. The argument that autonomous agents should be treated as distinct principals with their own identity also appears, in different form, in **Careful Adoption of Agentic AI Services** (joint Five Eyes guidance, May 2026), which Post 9 engages with in detail; the two bodies of work were developed independently and the convergence is discussed there.


