Skip to main content
ADVERTISEMENT

Chapter 22 · Human Oversight and Governance Design

The agent does not replace the human. It changes what the human needs to be good at — and that is a harder problem.


The Design Problem Nobody Planned For

When organisations deploy agentic AI, they typically design the agent carefully and design the human role around it as an afterthought. The agent's scope is specified, its tools are chosen, its failure modes are analysed. The human's role in the new arrangement — what they do, what skills they need, how they maintain the judgment that oversight requires — is usually described in a single sentence: "a human reviews outputs before they are acted upon."

This chapter argues that the human role deserves the same engineering rigour as the agent's, and that the governance practices that keep that human role genuine in production deserve the same rigour as the engineering that built the agent. The decisions about which skills to protect, how to design handoff points, how to calibrate internal trust, and who carries accountability are not soft organisational questions. They are design decisions with measurable consequences for the reliability of the system and the long-term capability of the organisation. Equally, a carefully designed collaboration structure will degrade in production if it is not maintained by governance practices that keep oversight meaningful, catch the threats that standard monitoring cannot detect, and produce the audit record that accountability requires.


The Jagged Frontier in Practice

The most important empirical finding about how AI affects human performance is what researchers have called the jagged technological frontier. Studying knowledge workers using large language models, Dell'Acqua et al. found that AI assistance raised the performance floor significantly — workers who previously performed below the group median improved substantially. At the same time, the performance ceiling barely moved: workers who already performed at the top of the group improved modestly, and in some task categories declined slightly as they over-relied on AI-generated content that did not reflect their own superior judgment.1

For agentic AI, this effect is amplified. An agent that handles the structured, routine portion of a workflow — the data gathering, the formatting, the first-pass analysis — helps a less experienced worker produce outputs that previously required more skill. The same agent, given to a highly experienced worker, may produce outputs that the experienced worker would have done better manually, but that they now accept without sufficient scrutiny because the agent's output looks complete.

The practical implication is not that agentic AI is bad for experienced workers. It is that the collaboration design needs to be calibrated to experience level:

For less experienced workers: agents should handle the execution layer, freeing humans to focus on judgment calls that require contextual understanding they are actively developing. The risk is that the agent also handles the judgment calls, leaving the human in a reviewing role without the craft to review effectively.

For experienced workers: agents should handle the volume and routine, allowing experienced humans to operate at higher leverage. The risk is that experienced workers stop exercising the judgment they are supposed to apply, treating agent outputs as authoritative rather than as inputs to their own assessment.

Neither risk is managed by deploying agents and hoping for the best. Both require deliberate collaboration design.

Key takeaway: Agentic AI raises the floor and barely moves the ceiling. The collaboration design must account for this asymmetry — protecting skill development for less experienced workers while ensuring experienced workers maintain the judgment that makes their oversight meaningful.


Skill Atrophy by Category

When an agent handles a task, the humans who previously performed that task lose the practice that maintained their capability. This is not a new phenomenon — it applies to every form of automation — but in the context of knowledge work, the skills at risk are foundational to professional identity and hard to rebuild once lost.

The skills most vulnerable to atrophy from agentic AI deployment fall into four categories:

Craft skills are the tacit, practice-dependent skills at the core of a profession: the writer's instinct for structure and voice, the analyst's ability to spot anomalies in data, the engineer's intuition for where code will break. These skills are built through thousands of hours of practice and feedback. An agent that handles the production layer removes the practice. A professional who reviews agent-generated output without generating their own is not building the craft; they are exercising a judgment skill that depends on the craft they are no longer practising.

Diagnostic skills are the ability to identify what is wrong and why. They are built through exposure to failures — debugging code, handling customer complaints, resolving process exceptions. When agents handle exceptions automatically, humans are exposed to fewer failures and build less diagnostic intuition. The paradox is that the diagnostic skills atrophy precisely because the agent is working well.

Domain knowledge accumulates through the process of searching for answers, making mistakes, and correcting them. An employee who asks an agent for policy information and acts on the answer without independently verifying it is not building the domain knowledge that would allow them to recognise when the agent's answer is wrong.

Relational skills — the ability to read a room, manage a difficult conversation, build trust with a sceptical client — are at lower immediate risk from agentic AI but are indirectly affected when agents handle the transaction volume that would normally create opportunities to practise them.

The appropriate response is not to withhold agents from junior employees to force them to build skills the hard way. The productivity benefit of agents to junior workers is real and substantial. The appropriate response is to design the collaboration so that deliberate skill-building practice is preserved alongside the productivity gain — a harder design problem but not an insoluble one.


The Junior Employee Problem

The junior employee problem is a specific instantiation of the skill atrophy challenge and deserves separate treatment because its organisational consequences are longer-term and less visible.

Junior employees are the principal beneficiaries of agentic AI in the near term. Research on customer service agents showed that AI assistance improved the performance of low-skilled workers by 34% while producing minimal gains for top performers.2 Similar patterns have been documented in coding, legal drafting, financial analysis, and writing. Agents are most helpful to the people who need the most help.

The problem surfaces in year three, not year one. A junior employee who has spent two years working primarily as an agent reviewer and editor, rather than as a practitioner, reaches the point in their career where they should be capable of independent judgment — and discovers that they have not built it. The firm has the same nominal headcount, and the same job titles, but the knowledge and judgment that those titles should represent have not been developed.

This is not a hypothetical risk. It is the predictable consequence of deploying agents as productivity tools without designing the human development trajectory around them. Firms that deploy agents without redesigning how junior employees develop will face a quiet capability cliff in their mid-tier workforce within a professional generation.

The mitigation is deliberate: reserve a defined category of work — complex, ambiguous, judgment-intensive — for junior employees to perform without agent assistance. Make this a design constraint, not an afterthought. Treat it as a talent investment with a payback horizon measured in years, not quarters.


Role Redesign as an Engineering Problem

The most common approach to role redesign in agentic AI deployments is to subtract agent tasks from the human job description and leave the rest unchanged. This produces a human role that is lighter in volume but not different in kind — a human who reviews agent outputs doing a job that was designed for someone who produced them.

Better practice treats role redesign as a design problem with explicit requirements:

Define the "last mile" for each agent. For every agent in the programme, specify precisely what the human does after the agent has done its work. Not "reviews output" — but specifically what the human checks, what judgment they apply, what authority they exercise. A human whose last-mile task is well-defined can be selected, trained, and evaluated for it. A human whose task is "review output" has an undefined job.

Design handoff points, not handoff events. The moment when the agent passes work to a human should be designed as carefully as any other part of the workflow. What information does the human need to exercise judgment? What does the agent surface, and in what format? What is the explicit action the human is being asked to take? Handoff points designed with this specificity produce better human performance and cleaner accountability.

Scope human review to where it adds value. Reviewing every agent output is expensive and tends toward rubber-stamping — humans become accustomed to accepting outputs and stop exercising judgment. Human review of high-consequence or low-confidence outputs is more expensive per review but produces genuine oversight. The design principle is to concentrate human attention where it matters rather than distributing it uniformly where it cannot be sustained.


Internal Trust Calibration

Chapter 9 addressed trust calibration for customer-facing AI. The internal version of the same problem is different in character and less frequently discussed.

Employees who work with agents daily develop a working relationship with them that can drift in either direction: toward over-trust (accepting outputs without scrutiny because the agent is usually right) or toward under-trust (checking every output because of a few memorable failures). Neither is the right calibration, and both represent a design failure.

Over-trust in internal contexts is insidious because it is invisible until something goes wrong. An analyst who has accepted 500 consecutive agent-generated reports without finding an error has good empirical reason to trust the next one — until the 501st contains a significant mistake that calibrated scrutiny would have caught. Building oversight processes that maintain scrutiny even when the agent has a long track record of accuracy is a systems design problem, not a behavioural one.

Under-trust is a productivity problem with a specific cause: employees who do not understand what the agent is good at will not trust it for the things it does well. This is an information problem. Employees who have been given a clear account of the agent's reliable capabilities, its known failure modes, and the specific conditions under which its outputs should be treated with caution can calibrate their trust accurately. Employees who were given a tool and told it was accurate cannot.

The design implication: internal employees should receive the same kind of transparency about agent capability that Chapter 9 recommends for customers — clear scope disclosure, confidence indication where possible, and explicit guidance on when to escalate rather than accept.


Accountability Culture

The legal and regulatory accountability questions of agentic AI are addressed in Chapter 24. This section addresses the internal culture question: when an agent makes a consequential error, who within the organisation is responsible for it?

Most organisations do not have a clear answer to this question. Accountability for agent errors tends to be distributed across the vendor, the IT team, the business unit, and nobody in particular. This distribution reflects the genuine difficulty of assigning responsibility for complex systems, but it produces a culture in which nobody has a strong personal incentive to catch agent errors before they propagate, because nobody owns the consequence of missing them.

Designing accountability explicitly means making two decisions:

Who signs off? For every agent-assisted decision with meaningful consequences, there should be a named human who is accountable for that decision — not for the agent's performance in the abstract, but for the specific decision that was made. That human's name should appear in the audit trail. The knowledge that their name will appear is the mechanism through which human oversight remains meaningful rather than nominal.

What is the consequence? Accountability without consequence is performance. When an agent makes an error that a responsible human would have caught, the accountability structure should produce a clear, proportionate consequence for the person who was responsible for oversight. Organisations that treat all agent errors as system failures, rather than distinguishing between errors that oversight should have caught and errors that no reasonable oversight could have caught, remove the incentive structure that makes human oversight work.

Key takeaway: The human role in an agentic system is not a residual — it is a design deliverable. The organisations that get this right design the human contribution with the same rigour they apply to the agent, and build the accountability structures that keep the human contribution genuine.


Governance as an Ongoing Practice

Most enterprise governance frameworks for AI are structured around deployment gates: a series of checks that a system must pass before it goes live. The checks are valuable. But an agent that passes all pre-deployment checks and is then left to run unsupervised is not a governed system — it is a system that was governed once, at a single point in time, under conditions that are now months or years in the past.

The governance of live agents requires three practices that pre-deployment checks cannot substitute for: ongoing adversarial testing to find what standard monitoring misses, oversight design that keeps human control genuine rather than nominal, and audit trail construction that makes the agent's behaviour reconstructable for the purposes of accountability rather than just debugging.

These practices define what operational governance of a live agent actually means — as distinct from the technical logging of Chapter 11, the pre-deployment evaluation of Chapters 15 and 16, the measurement and improvement practices of Chapter 21, and the regulatory compliance of Chapter 24.


Adversarial Testing in Production

Pre-deployment red-teaming is now standard practice in responsible AI deployment. Post-deployment red-teaming — the ongoing adversarial testing of agents that are already running in production — is not yet standard, and it is where a significant fraction of exploitable vulnerabilities go undetected.

The case for post-deployment adversarial testing rests on three observations:

Threat actors learn. The adversarial techniques that were tested against an agent before deployment are not the techniques that will be used against it six months into production. Adversaries observe the agent's behaviour, identify its patterns, and develop exploits tailored to its specific configuration. Pre-deployment testing tests against known attack patterns. Post-deployment testing finds the unknown ones that emerge from observed behaviour.

Agents evolve. Prompt updates, scope changes, model updates, and new tool integrations change the agent's attack surface. An agent that was tested comprehensively at launch has a different profile after twelve months of operational evolution. The post-deployment testing programme must track these changes and update its scope accordingly.

Production context introduces new attack vectors. The data sources, communication channels, and external services that an agent connects to in production are rarely fully replicated in test environments. Real-world inputs — customer emails, documents, web content, API responses — contain payloads that no pre-deployment test scenario fully anticipates.

NIST's AI Risk Management Framework positions adversarial testing as part of its Measure function, defining red teaming as "an approach consisting of adversarial testing of AI systems under stress conditions to seek out AI system failure modes or vulnerabilities."3 The March 2025 update to NIST AI 100-2 extended this framework to cover autonomous agent vulnerabilities specifically, including indirect prompt injection, agent memory poisoning, and supply chain attacks on agent tools.4 MITRE ATLAS, which extends the ATT&CK framework for AI-specific threats, provides the most widely used taxonomy for structuring adversarial test coverage across attack categories.5

A post-deployment adversarial testing programme should cover at minimum:

Prompt injection via real-world inputs. Testing whether adversarially crafted content in the agent's normal input channels — emails, documents, form submissions — can redirect the agent's behaviour. This should be conducted with inputs that reflect the actual content the agent encounters, not the sanitised inputs used in pre-deployment testing.

Tool abuse scenarios. Testing whether the agent can be induced, through crafted inputs or edge case reasoning paths, to use its tools in ways that exceed its intended scope: exfiltrating data, escalating permissions, performing actions in categories it was not designed to handle.

Multi-agent trust exploitation. For agents that interact with other agents — receiving instructions from orchestrators or calling specialist subagents — testing whether an adversarially influenced input at one point in the chain propagates harmful instructions to other agents.

The output of each adversarial testing cycle is a formal report: what was tested, what was found, what was fixed, and what remains open. This report feeds the audit trail described below.

Key takeaway: Post-deployment adversarial testing is not a repetition of pre-deployment red-teaming. It is a different practice, testing for threats that emerge from observed production behaviour and operational evolution — threats that pre-deployment testing cannot find.


Oversight Design: Meaningful vs. Nominal

Human oversight of agentic systems exists on a spectrum from nominal to meaningful.

Nominal oversight satisfies the formal requirement for human review without producing genuine accountability. A human who technically reviews agent outputs but lacks the context, time, or authority to change them is providing nominal oversight. A review process that approves 99.7% of outputs in under ten seconds is providing nominal oversight. A review checklist that asks whether outputs are "reasonable" without defining what reasonable means is providing nominal oversight.

Meaningful oversight exists when the human reviewer understands what the agent is doing, has sufficient context to evaluate the output correctly, has adequate time to apply genuine judgment, and has the authority and incentive to reject or escalate outputs that do not meet the required standard.

The design elements that determine which kind of oversight a system produces:

Reviewer capability matching. The humans providing oversight should be competent to evaluate the outputs they are reviewing. A financial compliance agent whose outputs are reviewed by a junior administrator who does not understand the underlying regulations is providing nominal oversight regardless of how the process is described.

Context provision. Reviewers need enough context to make a genuine judgment. An agent output presented without the inputs that produced it, the intermediate reasoning steps, and the confidence level the agent assigned to its conclusion does not give a reviewer what they need to assess it. Oversight interface design should surface the information reviewers need, not just the output to be approved.

Calibrated review volume. As established in the role redesign section above, reviewing every output is not effective oversight — volume fatigue produces rubber-stamping. Oversight design should concentrate human attention where it adds the most value: high-consequence outputs, low-confidence outputs, unusual patterns, and a systematic random sample for quality monitoring.

Override authority. Reviewers must have genuine authority to reject, modify, or escalate agent outputs. A process in which rejecting an agent output requires a lengthy exception process effectively removes the authority from the reviewer. The override path must be as frictionless as the approval path.


The Autonomy Dial

A persistent governance challenge in agentic AI is that the right level of human oversight for a given agent is not static. An agent that requires review at every step at launch may earn, through demonstrated reliability, the right to operate with less oversight over time. Equally, an agent that has been operating autonomously may need its oversight level increased when its risk profile changes, its scope expands, or its reliability declines.

The concept of an autonomy dial — the explicit, managed calibration of how much independent authority an agent exercises — is simple in principle and difficult in practice. The difficulty is that reducing oversight must be earned through demonstrated performance, and increasing oversight must be executed without disrupting operations or creating the impression that something has gone wrong.

The criteria for adjusting the autonomy dial should be specified at design time, not determined ad hoc when stakeholders feel that the agent has "been running fine for a while":

To reduce oversight requirements: the agent must have demonstrated, over a statistically meaningful sample, that its error rate in the category being autonomised is below the threshold that justifies the oversight cost. The evaluation suite must cover the relevant category adequately. A formal decision record should document the criteria, the evidence, and the approval.

To increase oversight requirements: any of the following should automatically trigger a governance review: a model update, a significant scope change, an adverse incident, a rising override rate, or a regulatory development that changes the compliance requirements for the agent's actions.


The Audit Trail as a Governance Artefact

Chapter 11 describes the observability infrastructure that allows engineering teams to reconstruct what an agent did when something goes wrong. The audit trail described here serves a different purpose: it provides the evidence that regulators, auditors, and internal accountability processes require to assess whether the agent's operation was consistent with the organisation's obligations.

These are different requirements. An engineering audit might need a complete trace of every tool call, model invocation, and intermediate output. A compliance audit typically needs something more structured and less granular: evidence that the governance processes that were supposed to be in place were actually in place, that human oversight was genuine rather than nominal, and that the organisation can demonstrate, for any decision the agent was involved in, what the agent did and who was accountable for it.

An audit-ready trail for agentic AI should capture:

Decision attribution. For every consequential action the agent takes, the trail should record: what decision was made, what evidence the agent used, what confidence level the agent assigned, whether human oversight was applied, and if so who provided it and what their conclusion was.

Scope adherence. Evidence that the agent operated within its defined scope: that it did not access data sources it was not authorised to use, did not call tools it was not configured to call, and did not take actions in categories outside its designed authority.

Governance process execution. Evidence that the governance processes required for each stage of maturity were actually executed: that the adversarial testing cycle was completed on schedule, that the oversight review cadence was maintained, that model update assessments were performed.

Exception handling. A complete record of all incidents, escalations, and exceptions: what happened, how it was detected, what was done, and what the outcome was. The absence of exceptions in an audit trail is not a sign that the agent is performing perfectly — it is a sign that exceptions are not being recorded.

Key takeaway: The audit trail that governance requires is not the same as the observability infrastructure that engineering requires. Both are necessary. Neither is a substitute for the other.


Governance at Scale

The governance practices described in this chapter were designed for a single agent operating in a defined scope. Most enterprise AI programmes are moving toward portfolios of agents — multiple agents operating across multiple functions, with varying levels of autonomy, built on different model versions and updated on different schedules.

Governance at scale introduces problems that single-agent governance does not face:

Oversight resource allocation. If each agent requires a certain amount of human oversight, a portfolio of twenty agents requires twenty times that resource. Governance programmes that do not account for this arithmetic will either become resource-constrained (producing nominal oversight across the portfolio) or will concentrate oversight on the visible agents and leave others effectively ungoverned.

Consistency. Different agents, built by different teams, will implement governance practices differently. The central governance function must ensure that the standards applied to a low-profile internal tool are consistent with those applied to a customer-facing agent — not identical in every detail, but grounded in the same principles.

Portfolio risk. A portfolio of agents creates system-level risks that do not exist for individual agents. Agents that feed each other's inputs create error propagation risks. Agents that draw on the same data sources create correlated failure risks. The governance of the portfolio must address these system-level properties, not just the individual agent performance.

The organisational model that addresses these challenges is a central governance function with standardised requirements and distributed execution — described in Chapter 19's analysis of centralised versus embedded deployment models. Applied to governance specifically: the central function sets standards, provides tooling, and conducts portfolio-level reviews. The teams operating individual agents execute those standards against their specific agents and report into the central review process.


References

  1. Dell'Acqua, F., McFowland, E., Mollick, E.R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K.R. (2026). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Organization Science.
  2. Brynjolfsson, E., Li, D., & Raymond, L.R. (2023). Generative AI at Work. National Bureau of Economic Research. NBER Working Paper No. 31161.
  3. National Institute of Standards and Technology (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). U.S. Department of Commerce. NIST AI 600-1. https://doi.org/10.6028/NIST.AI.100-1
  4. National Institute of Standards and Technology (2025). Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. NIST AI 100-2 E2025. U.S. Department of Commerce. March 2025.
  5. MITRE Corporation (2025). MITRE ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems. Version October 2025. https://atlas.mitre.org
  6. Mollick, E. & Mollick, L. (2023). Assigning AI: Seven Approaches for Students, with Prompts. The Wharton School, University of Pennsylvania. SSRN: https://ssrn.com/abstract=4475995
  7. Wharton Human-AI Research & GBK Collective (2025). Accountable Acceleration: Gen AI Fast-Tracks Into the Enterprise. Wharton Human-AI Research & GBK Collective, University of Pennsylvania. October 2025.
  8. Lee, J.D. & See, K.A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50–80.
  9. Autor, D.H. (2015). Why Are There Still So Many Jobs? The History and Future of Workplace Automation. Journal of Economic Perspectives, 29(3), 3–30.
  10. Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P.N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for Human-AI Interaction. Proceedings of CHI 2019. ACM.
  11. OWASP (2025). OWASP Top 10 for Large Language Model Applications 2025. Open Web Application Security Project.
  12. Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. FAccT '20. ACM.

Building agentic AI and wondering why alignment is harder than the technology? Get in touch

ADVERTISEMENT