Skip to main content

Chapter 24 · When Agents Act: Legal Accountability and Compliance Operations

An AI that produces text is a tool. An AI that takes actions is an actor. The law has not fully caught up with this distinction, but it is beginning to.


When a large language model generates a document that contains a factual error, the legal question is relatively familiar: did the deployer exercise reasonable care, was the error material, did the user rely on it to their detriment? These are negligence and misrepresentation questions that existing law handles, imperfectly but recognisably.

When an agent cancels a supplier contract, submits a regulatory filing, transfers funds, or sends a legally binding communication — all actions that current enterprise agents perform — the legal question is different in kind. The agent has not produced information that a human may rely on. It has taken an action on behalf of the deploying organisation. The consequences are immediate, potentially irreversible, and attributable to a system that did not have the legal capacity to form the intent that the action implies.

This distinction — between AI as information producer and AI as action taker — is the central legal challenge of agentic deployment. It is not resolved by existing AI regulation, most of which was written with the former in mind. It is the gap into which most enterprise legal teams have not yet looked carefully.

This chapter does not survey all AI regulation across all jurisdictions. Chapter 9 already covers the customer-facing dimensions of the EU AI Act, GDPR, and FTC guidelines. This chapter addresses the specific legal and compliance challenges that arise when agents act — and what building an operational compliance function for agentic AI actually requires. Calo's legal roadmap for AI policy is useful here because it frames the governance problem not as whether existing law applies, but whether existing doctrines allocate responsibility clearly enough when autonomous systems create harms that do not map neatly onto human intent.5


Why Autonomous Action Changes the Liability Picture

The standard defence in AI-related harm claims has been that AI is a tool, and tools are operated by people who bear responsibility for how they use them. A company whose employee uses a spreadsheet to make an incorrect financial projection is not a defendant in a case about the spreadsheet's liability; it is a defendant in a case about its employee's conduct.

This defence weakens when the AI acts autonomously. Three specific weaknesses are worth understanding:

The delegation problem. When an organisation configures an agent to take actions automatically — without human review at each step — it has delegated decision-making authority to a system. Courts and regulators are increasingly examining whether that delegation was appropriate, whether the deployed system was sufficiently reliable for the authority delegated to it, and whether adequate oversight mechanisms were in place. The question is not whether AI can be used; it is whether this AI, delegated this authority, with this oversight, was a reasonable exercise of organisational responsibility.

The knowledge imputation problem. In most legal contexts, an organisation is treated as knowing what its systems know and doing what its systems do. When an agent sends a communication, the organisation sent a communication. When an agent agrees to terms, the organisation agreed to terms. The agent's actions are attributed to the organisation in a way that the agent's outputs, which a human acts upon, are not. This attribution is more direct and harder to disclaim.

The consent gap. Many of the actions agents take on behalf of users involve the interests of third parties — the supplier whose contract is modified, the customer whose preferences are changed, the employee whose records are updated. These third parties did not consent to be affected by an autonomous system. The legal frameworks for consent and notice were designed for human-to-human or human-to-system interactions. Autonomous agent-to-human interactions are a newer configuration that existing consent frameworks do not map onto cleanly.

Key takeaway: The "AI is a tool" defence does not hold when the AI acts autonomously. Organisations deploying agents that take real-world actions need a legal theory of accountability that is more specific than "a human was ultimately responsible."


The Chain-of-Responsibility Problem in Multi-Agent Systems

Single-agent deployments create a tractable accountability picture: one deploying organisation, one agent, one set of actions, one set of consequences. The deploying organisation owns the outcome.

Multi-agent systems — where specialist agents are coordinated by an orchestrating agent, where agents call external agent services, where agent outputs feed other agents' inputs — create a distributed accountability structure that existing legal frameworks handle poorly.

Consider a concrete scenario: an orchestrating agent, configured by Organisation A, calls a specialist data analysis agent provided as a service by Organisation B. The data analysis agent, due to a hallucination in its training or a prompt injection in its input, returns incorrect analysis. The orchestrating agent acts on the incorrect analysis, causing a material financial loss. Who is liable?

The deploying organisation (A) will argue that it relied on a third-party service of reasonable quality. The service provider (B) will argue that its agent performed within its designed parameters and that the deploying organisation is responsible for validating its outputs before acting on them. The foundation model provider whose model powers both agents will argue that it is an infrastructure provider, not a decision-maker. Each of these arguments has merit; none resolves the question.

Regulatory frameworks are beginning to address this. The EU AI Act introduces a layered accountability model that distinguishes between foundation model providers (responsible for technical documentation and training data transparency) and deployers (responsible for appropriate use, oversight, and risk classification).1 Practitioner summaries of the Act make the same distinction accessible for non-lawyers, but the operational challenge remains: the Act was designed with clearer deployment relationships in mind than many multi-agent systems will present.2 Multi-agent orchestration across multiple providers and deployers is a configuration it does not cleanly address.

Until regulatory clarity arrives, the practical guidance is:

Document the accountability chain at design time. Before deploying any multi-agent system, produce a written account of which organisation is accountable for which component's outputs, what validation is applied at each handoff between agents, and where the human oversight sits. This document will not resolve a legal dispute — but it will significantly reduce the organisation's exposure by demonstrating that the accountability questions were asked and answered before deployment.

Treat third-party agent services as trusted-but-verified. Outputs from external agent services should be validated before being acted upon, regardless of the service provider's reputation. The validation requirement is an accountability mechanism, not a reflection of distrust.


Risk Classification for Agentic Systems

The EU AI Act classifies AI systems into risk tiers — unacceptable risk, high risk, limited risk, minimal risk — with compliance obligations that scale with the tier.1 The classification is based primarily on the domain of application (healthcare, employment, education, law enforcement) and the nature of the decision being influenced.

This classification framework was designed for static AI systems: a model that performs credit scoring is either high-risk or it is not. Agentic systems are not static. An enterprise agent that handles customer service enquiries (limited risk), resolves IT tickets (limited risk), and processes employee expense claims (potentially higher risk when policy interpretation is involved) may be simultaneously operating in multiple risk categories. The classification framework provides no clear guidance on how to classify a system whose risk level varies by the task it is executing.

The practical approach that compliance teams have adopted in the absence of clear regulatory guidance:

Classify by highest-risk task, not average task. An agent whose scope includes any high-risk activity should be treated as a high-risk system across its full scope. Designing the agent's permissions and oversight mechanisms for the high-risk tasks and applying them consistently is both safer and simpler than attempting to modulate compliance obligations by task.

Maintain a task inventory with risk classifications. A written record of every task category the agent can perform, with a risk classification for each, provides the documentation that regulators expect and the operational clarity that compliance teams need. This inventory should be updated whenever the agent's scope changes.

Apply the high-risk obligations as a default standard. The EU AI Act's high-risk requirements — human oversight, technical documentation, accuracy and robustness standards, logging of decisions — represent a reasonable operating standard for any agentic system that takes consequential actions. Organisations that apply these requirements broadly, rather than narrowly to the minimum required scope, build compliance programmes that are more defensible and easier to audit.


Sector-Specific Obligations

Three sectors illustrate how agentic action intersects with existing regulatory regimes in ways that require compliance architecture beyond general AI regulation.

Financial services. MiFID II, the FCA's Consumer Duty, and equivalent frameworks in other jurisdictions impose obligations on firms to act in clients' best interests, maintain audit trails of advice and transactions, and ensure that automated decision-making in regulated activities is explainable and reviewable. The FCA's 2023 discussion paper on AI and machine learning is useful because it treats AI governance as a sector-specific extension of existing accountability, consumer protection, operational resilience, and model-risk obligations rather than as a standalone technology issue.4 An agent that executes trades, adjusts portfolio allocations, or generates personalised financial recommendations is operating in a regulated activity context. The regulatory obligation is not simply that a human could theoretically review the agent's decision — it is that the review is genuine, documented, and linked to the accountability of a named individual.

Healthcare. Agents operating in healthcare contexts — appointment management, care pathway coordination, patient communication, clinical decision support — intersect with medical device regulation, data protection requirements for health data, and clinical governance frameworks. The critical compliance issue for agentic healthcare AI is the boundary between clinical decision support (which the clinician acts upon) and clinical decision-making (which the clinician delegates). Many agents blur this boundary in practice, even when it is clear in design. The compliance function must monitor where the boundary is actually sitting, not just where it was intended to sit.

Employment. Agents that participate in recruitment screening, performance assessment, scheduling, or disciplinary processes operate in a regulatory context shaped by employment law, anti-discrimination legislation, and data subject rights. The specific compliance risk in employment contexts is that agent-assisted decisions that would be defensible if made by a human may not be defensible if made by an agent — because the legal standard for consistency, explainability, and non-discrimination is applied to the agent's outputs in ways that may not have been anticipated when the agent was configured.


Building the Compliance Function

A compliance programme for agentic AI is not a legal review that happens before deployment. It is an operational function that runs continuously, alongside the agents it governs.

The distinction matters because the compliance obligations of agentic AI are dynamic. The risk profile of a deployed agent changes as its scope evolves, as the data it operates on changes, as the regulatory environment shifts, and as the agent's actual behaviour in production diverges from its designed behaviour. A compliance programme that was adequate at launch may not be adequate six months later.

The components of an operational AI compliance function:

Ongoing risk monitoring. Regular review of whether deployed agents are operating within their intended scope and risk tier, whether any changes to configuration or usage patterns have created new risk exposures, and whether regulatory developments require compliance architecture changes.

Data subject rights management. GDPR and equivalent frameworks give data subjects rights to access, rectification, erasure, and explanation of automated decisions that significantly affect them. For agentic AI, managing these rights requires knowing which agents processed data about which individuals, what decisions those agents contributed to, and being able to reconstruct the agent's reasoning in a form that supports a meaningful explanation. The ICO and Alan Turing Institute guidance on explaining AI-assisted decisions makes this operational point explicit: explanation requires planning what information will be available to affected people before the system is deployed, not after a complaint arrives.3 This capability must be designed into the system architecture — supported by the audit trail mechanisms described in Chapter 22 — and not retrofitted when a data subject rights request arrives.

Regulatory horizon scanning. AI regulation is active across all major jurisdictions. The compliance function must maintain awareness of regulatory developments, assess their implications for the deployed agent portfolio, and initiate architecture changes when requirements change. This is a professional function, not a quarterly review of news headlines.

Key takeaway: Compliance for agentic AI is an operational function, not a pre-deployment gate. The organisations that build it as a continuous practice, rather than a periodic review, will be materially better positioned as regulatory obligations crystallise.


Compliance Operations Checklist

A useful test of whether compliance has moved from legal review to operational practice is whether the organisation can answer the following questions for every deployed agent:

QuestionEvidence the organisation should hold
What actions can the agent take?Task inventory, tool permissions, autonomy level
Which legal or regulatory regimes apply?Risk classification record, jurisdictional analysis
Who is accountable for each consequential action?Named owner, approval chain, oversight record
What explanation can be given to an affected person?Decision record, source evidence, review notes
What happens when the agent acts outside scope?Incident playbook, escalation route, rollback procedure
How is compliance rechecked after changes?Change-management log, model-update review, periodic audit

This checklist connects the legal discussion in this chapter to the internal auditing approach developed by Raji et al.: compliance depends on an end-to-end accountability process that follows the system from design through deployment, monitoring, incident response, and retirement, rather than a single pre-launch review.7 It also reflects Doshi-Velez et al.'s explanation framework: explanations are not decorative narrative after the fact; they are the mechanism through which affected people, reviewers, and courts can understand whether an automated decision was justified.6

Key takeaway: Legal defensibility in agentic AI is not created by a policy document. It is created by operational evidence: task inventories, accountability chains, audit trails, explanation records, and proof that these artefacts were maintained as the agent changed.


References

  1. European Parliament and Council (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council (EU AI Act). Official Journal of the European Union, L 2024/1689.
  2. Future of Life Institute (2024). High-level summary of the AI Act. Available at: artificialintelligenceact.eu/high-level-summary
  3. Information Commissioner's Office (2023). Explaining decisions made with AI. ICO / Alan Turing Institute. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/artificial-intelligence/explaining-decisions-made-with-ai/
  4. Financial Conduct Authority (2023). Artificial Intelligence and Machine Learning: Discussion Paper DP23/4. FCA. https://www.fca.org.uk/publications/discussion-papers/dp23-4-artificial-intelligence-and-machine-learning
  5. Calo, R. (2017). Artificial Intelligence Policy: A Primer and Roadmap. UC Davis Law Review, 51(2), 399–435.
  6. Doshi-Velez, F., Kortz, M., Budish, R., Bavitz, C., Gershman, S., O'Brien, D., Scott, K., Schieber, S., Waldo, J., Weinberger, D., Wroblewski, A., & Wood, A. (2017). Accountability of AI Under the Law: The Role of Explanation. Berkman Klein Center for Internet & Society Working Paper. arXiv:1711.01134.
  7. Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. FAccT '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 33–44. ACM.

Building agentic AI and wondering why alignment is harder than the technology? Get in touch