Chapter 5 · One Agent or Many? Designing for Scale and Complexity

The architectural decision that shapes everything downstream.

The Default That Isn't Safe

When organisations start building with agentic AI, the first instinct is usually to build one agent and make it smarter. Add more tools. Expand the system prompt. Increase the context window. This approach works — until it doesn't.

The failure is predictable. A single agent handling twenty different responsibilities becomes brittle in ways that are hard to diagnose. When something breaks, it is unclear which capability failed. When you try to improve one behaviour, you inadvertently degrade another. And when you need to scale — running thousands of concurrent instances — the cost and latency of a maximally capable agent hits hard.

The alternative — distributing work across multiple specialised agents — solves these problems but introduces new ones. Coordination overhead. State management. Partial failure handling. Trust between agents you do not fully control.

There is no universally correct answer. But there is a principled way to think through the choice.

This chapter begins Part 2 by translating the foundations from Part 1 into architecture. If Part 1 established that agents are economically consequential, multimodal, and increasingly embedded in enterprise workflows, the first design question is now structural: should the capability be concentrated in one agent, or distributed across a system of specialists?

Automation, GenAI Workflows, and Agentic AI

Before choosing between one agent and many, it is important to be precise about what is being designed. Not every automated workflow is an agent. Not every workflow that calls a language model is agentic. Enterprises have used automation for decades: rules engines, workflow systems, fraud monitoring, batch processing, RPA, and decision-support tools all execute tasks without human intervention. Agentic AI does not replace that history. It extends automation into less structured, more language-heavy, more context-dependent work.

A useful distinction is between automation, GenAI workflows, and agentic systems. Automation follows predefined rules or process paths. A GenAI workflow may call a model to summarise, classify, draft, or extract information, but the process itself is still orchestrated by fixed code. An agentic system gives the model or agent runtime some degree of control over how the task is completed: which tools to use, which information to retrieve, when to ask for clarification, when to escalate, and how to adapt after observing results. Anthropic makes a similar distinction between workflows, where LLMs and tools follow predefined code paths, and agents, where the LLM dynamically directs its own process and tool use.³

Pattern	What controls the process?	Example	Agentic?
Classic automation	Rules, code, workflow engine	If invoice amount exceeds threshold, route to manager	No
GenAI workflow	Predefined workflow with model calls	Extract text, call LLM to summarise, send summary to reviewer	Partially, but mostly workflow automation
Agentic system	Agent chooses or adapts steps within boundaries	Inspect documents, decide what to retrieve, check against policy, escalate uncertainty	Yes, if tool choice, sequencing, and adaptation are delegated

This distinction matters because the architecture should match the task. A fixed workflow is often safer, cheaper, and easier to govern than an agent. The case for an agent begins when the task cannot be fully specified in advance, when the system must interpret messy inputs, when tool choice depends on intermediate results, or when the next step must adapt to what the system discovers.

Key takeaway: Agentic AI is not the invention of automation. It is a design pattern for giving AI systems bounded control over process, tool use, context, and escalation in workflows that are too variable to script completely.

The Single-Agent Case

A single agent is not automatically the inferior option. For many enterprise use cases, it is the right one.

The case for a single agent is strongest when:

The task is bounded — well-defined inputs, predictable outputs, limited tool use.
Context must flow freely — the agent needs to remember and reason across every step without serialising state between handoffs.
Latency is critical — inter-agent communication adds round-trip time that may be unacceptable in real-time applications.
The team is small — multi-agent systems demand more engineering rigour to build and operate safely.

A well-designed single agent with a clear purpose, a minimal toolset, and explicit scope boundaries will outperform a poorly coordinated network of agents on almost any metric. Complexity is not a sign of sophistication — it is a cost.

When Multi-Agent Becomes Necessary

The multi-agent architecture earns its overhead when the problem genuinely exceeds what a single agent can handle reliably.

Parallelism is the clearest driver. If a task can be decomposed into independent subtasks — research five markets simultaneously, process ten documents in parallel — a single sequential agent is an unnecessary bottleneck. Orchestrating five parallel subagents cuts wall-clock time by roughly the same factor.

Specialisation matters when domain depth genuinely outweighs generalist range. A coding agent fine-tuned on your internal codebase will consistently outperform a general-purpose agent on repository-specific tasks. A compliance agent trained on financial regulations will catch issues a broader model might miss.

Context limits force the issue for long-horizon tasks. A complex research project might accumulate more context than any single model can hold. Before that threshold is reached, a single agent's effective horizon can be extended by pairing in-context short-term memory with an external long-term vector store — retrieving relevant past state on demand rather than carrying it all in the active window.¹ When even this hybrid memory architecture is insufficient for the task's scope, breaking the work into phases — each handled by a fresh agent instance — is not a workaround; it is good engineering.

Verification is underused. One of the highest-value applications of a second agent is simply checking the first agent's work. A critic agent that reviews outputs before they reach a human adds a layer of reliability that is difficult to achieve through prompt engineering alone. A more structured form of this is the debate paradigm, in which multiple agents independently propose answers and then argue their positions across several rounds until converging on a consensus — an approach that has been shown to improve factual accuracy on reasoning tasks.² A related risk runs in the other direction: in multi-agent pipelines, a hallucination generated by one agent can be accepted and amplified by others downstream, making error propagation a distinct failure mode that does not exist in single-agent systems.²

Key takeaway: Multi-agent debate is not just a verification trick — it is a structured process in which disagreement between agents actively drives output quality upward, while also introducing a new risk: one agent's error can propagate and compound through the rest of the pipeline.

Orchestration Patterns

Multi-agent systems organise themselves in four broad patterns, each with different trade-offs.

Hierarchical is the most common pattern in enterprise deployments. A central orchestrator receives the user's goal, decomposes it into subtasks, delegates each to a specialised subagent, and synthesises the results. The orchestrator handles planning and coordination; subagents handle execution. This pattern is predictable and auditable — critical for governance.

Collaborative architectures allow agents to communicate directly and negotiate on outputs. This pattern excels at tasks requiring genuine deliberation — where multiple perspectives improve the final answer — but is harder to debug when things go wrong, since causality is distributed.

Pipeline architectures are the simplest multi-agent pattern. Each agent performs a transformation on the output of the previous one: extract, then analyse, then validate, then format. These are easy to reason about, easy to monitor, and easy to modify — at the cost of no parallelism and no feedback loops.

Routing is a fourth pattern worth naming explicitly: a classifier step — itself often an LLM call — directs each incoming task to the most appropriate specialised sub-agent or prompt, rather than passing everything through the same path.³ This is particularly effective when inputs vary enough that optimising for one type would degrade performance on others, and it is among the most commonly deployed patterns in production agentic systems.³

Key takeaway: Not every multi-agent system needs full orchestration — routing lets a lightweight classifier send each task to the right specialist without the overhead of a central planner managing a full workflow.

The Interface Contract Between Agents

The most common mistake in multi-agent design is treating delegation as conversation. Human colleagues can repair ambiguity through shared context, informal judgement, and follow-up questions. Software agents cannot be trusted to do this reliably. They need explicit contracts.

Every handoff between agents should define five things:

Contract Element	Design Question
Input schema	What exactly is the receiving agent allowed to assume?
Output schema	What format must the receiving agent return?
Authority boundary	What tools, data, and decisions are in scope?
Failure signal	How does the agent report uncertainty, refusal, or partial completion?
Audit trace	What intermediate reasoning, evidence, or tool calls must be preserved?

This contract discipline is what turns a loose collection of prompts into an engineered system. It also creates the foundation for later governance: if you cannot describe what each agent was authorised to do, you cannot later explain why the system acted as it did.

Key takeaway: Multi-agent systems do not fail only because agents reason poorly. They also fail because responsibilities, handoffs, and authority boundaries were never made explicit enough to debug.

What This Looks Like in Practice

The examples below are illustrative enterprise patterns rather than named case studies. Their purpose is to clarify where the boundary sits between ordinary automation, GenAI workflow automation, and genuinely agentic design.

Example 1 — Contract Review: From Summary Workflow to Single Agent

Field	Design
Problem	Legal teams need fast first-pass review of supplier contracts against a company playbook.
Non-agentic version	OCR extracts text, an LLM summarises the document, and a fixed checklist flags missing clauses.
Agentic setup	A single contract-review agent identifies clause types, decides which policy documents to retrieve, checks deviations against the playbook, asks for missing commercial context when needed, and routes high-risk clauses to legal review.
Agent pattern	Single agent with bounded tool use.
Tools/data	Contract repository, clause library, procurement policy, approved fallback language, escalation queue.
Human oversight	Legal counsel reviews deviations above a defined risk threshold; low-risk summaries remain draft-only.
Main risk	The agent may treat a policy mismatch as a drafting issue rather than a legal risk unless escalation rules are explicit.

What makes this agentic is not the summarisation. It is the delegated process control: deciding what to inspect, what source to consult, whether the evidence is sufficient, and when to escalate.

Example 2 — Market Entry Research: When Multi-Agent Earns Its Cost

Field	Design
Problem	A strategy team needs a preliminary assessment of whether to enter three new markets.
Non-agentic version	Analysts manually search sources, produce a research deck, and ask an LLM to draft sections.
Agentic setup	An orchestrator decomposes the goal into market size, competitor landscape, regulatory barriers, pricing, and risk. Specialist agents research each stream in parallel, while a critic agent checks source quality, contradictions, and unsupported claims before synthesis.
Agent pattern	Hierarchical multi-agent system with critic layer.
Tools/data	Web search, internal sales data, regulatory databases, analyst notes, citation checker.
Human oversight	Strategy lead reviews the final synthesis and all high-impact assumptions before any investment decision.
Main risk	One subagent's unsupported assumption can propagate into the final recommendation if the critic layer checks style but not evidence.

Here, the multi-agent design is justified by parallelism, specialisation, and independent verification. A single agent could produce a report, but it would be harder to isolate weak evidence, audit subtask quality, or run streams concurrently.

Example 3 — Compliance Critic: Agent or Checklist?

Field	Design
Problem	A financial services team wants a review layer before customer-facing AI-generated communications are sent.
Non-agentic version	A rules engine checks for prohibited phrases and required disclosures.
Agentic setup	A compliance critic agent reads the proposed communication, retrieves the relevant policy, checks whether the message makes a regulated claim, identifies missing disclosures, and either approves, requests revision, or escalates.
Agent pattern	Verification agent operating after a drafting agent or human author.
Tools/data	Policy library, approved disclosure templates, customer segment metadata, audit log.
Human oversight	Compliance officer reviews all escalations and a statistical sample of approvals.
Main risk	If the critic shares the same model, prompt assumptions, or retrieved context as the drafting agent, it may reproduce the same blind spots rather than provide independent verification.

This is agentic only if the reviewer is doing more than matching rules. The agentic element is its ability to interpret the proposed communication in context, retrieve the applicable policy, decide whether the risk category has changed, and select the appropriate escalation path.

The Hidden Costs of Multi-Agent Complexity

Every additional agent in a system adds overhead that compounds across the task lifecycle.

Cost Type	Description	Mitigation
Latency	Each inter-agent handoff adds round-trip time	Parallelise where possible; set strict timeouts
Token cost	Agents passing context to each other re-tokenise it	Summarise before handoff; avoid full-context transfer
Failure propagation	A subagent failure can block the orchestrator	Design for partial failure; define fallback paths
Trust surface	Agents receiving instructions from other agents can be manipulated	Validate inter-agent messages; scope agent permissions
Observability	Debugging distributed agent behaviour is significantly harder	Instrument every handoff; log intermediate states

The trust surface issue deserves particular attention. When an agent receives instructions from an orchestrator rather than directly from a human, the assumption of legitimacy is weaker. A compromised or misconfigured orchestrator can issue instructions that downstream agents would never receive from a human operator. This is not theoretical — it is an active area of security concern. As the number of agents in a system grows, these risks compound: research on scaling multi-agent architectures finds that communication network complexity increases with agent count, making hallucination amplification and coordination failures progressively harder to contain.⁴

A Decision Framework

Before committing to an architecture, work through these questions:

Can the task be completed reliably by a single, well-scoped agent? If yes, start there.
Does the task have genuinely independent subtasks that benefit from parallelism? If yes, a multi-agent architecture earns its cost.
Does domain depth in any subtask justify specialisation? If yes, consider dedicated subagents for those areas.
Does the output require independent verification before it reaches a human? If yes, build in a critic layer.
What is the cost of a partial failure? High-stakes outputs require more isolation between agent components.

Architecture should follow risk. The more consequential the agent's actions, the more important it is that responsibilities are clearly separated and failure modes are well-understood.

Chapter 5 establishes the internal architecture choice: concentrate capability in one well-scoped agent, or distribute it across specialised agents with explicit contracts. Chapter 6 expands that question outward. Once an organisation knows the shape of the agent system it wants to build, it must decide which platform layers, model providers, orchestration frameworks, and interoperability standards will define the environment in which that system lives.

References

1. Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W.X., Wei, Z., & Wen, J. (2024). A Survey on Large Language Model-Based Autonomous Agents. Renmin University of China.
2. Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., & Zhang, X. (2024). Large Language Model Based Multi-Agents: A Survey of Progress and Challenges. University of Notre Dame.
3. Anthropic (2025). Building Effective Agents. Anthropic.
4. Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., Zheng, R., Fan, X., Wang, X., Xiong, L., Zhou, Y., Wang, W., Jiang, C., Zou, Y., Liu, X., Yin, Z., Dou, S., Weng, R., Cheng, W., Zhang, Q., Qin, W., Zheng, Y., Qiu, X., Huang, X., & Gui, T. (2023). The Rise and Potential of Large Language Model Based Agents: A Survey. Fudan University.

Building agentic AI and wondering why alignment is harder than the technology? Get in touch

The Default That Isn't Safe​

Automation, GenAI Workflows, and Agentic AI​

The Single-Agent Case​

When Multi-Agent Becomes Necessary​

Orchestration Patterns​

The Interface Contract Between Agents​

What This Looks Like in Practice​

Example 1 — Contract Review: From Summary Workflow to Single Agent​

Example 2 — Market Entry Research: When Multi-Agent Earns Its Cost​

Example 3 — Compliance Critic: Agent or Checklist?​

The Hidden Costs of Multi-Agent Complexity​

A Decision Framework​

References​