Chapter 5 · One Agent or Many? Designing for Scale and Complexity
The architectural decision that shapes everything downstream.
The Default That Isn't Safe
When organisations start building with agentic AI, the first instinct is usually to build one agent and make it smarter. Add more tools. Expand the system prompt. Increase the context window. This approach works — until it doesn't.
The failure is predictable. A single agent handling twenty different responsibilities becomes brittle in ways that are hard to diagnose. When something breaks, it is unclear which capability failed. When you try to improve one behaviour, you inadvertently degrade another. And when you need to scale — running thousands of concurrent instances — the cost and latency of a maximally capable agent hits hard.
The alternative — distributing work across multiple specialised agents — solves these problems but introduces new ones. Coordination overhead. State management. Partial failure handling. Trust between agents you do not fully control.
There is no universally correct answer. But there is a principled way to think through the choice.
The Single-Agent Case
A single agent is not automatically the inferior option. For many enterprise use cases, it is the right one.
The case for a single agent is strongest when:
- The task is bounded — well-defined inputs, predictable outputs, limited tool use.
- Context must flow freely — the agent needs to remember and reason across every step without serialising state between handoffs.
- Latency is critical — inter-agent communication adds round-trip time that may be unacceptable in real-time applications.
- The team is small — multi-agent systems demand more engineering rigour to build and operate safely.
A well-designed single agent with a clear purpose, a minimal toolset, and explicit scope boundaries will outperform a poorly coordinated network of agents on almost any metric. Complexity is not a sign of sophistication — it is a cost.
When Multi-Agent Becomes Necessary
The multi-agent architecture earns its overhead when the problem genuinely exceeds what a single agent can handle reliably.
Parallelism is the clearest driver. If a task can be decomposed into independent subtasks — research five markets simultaneously, process ten documents in parallel — a single sequential agent is an unnecessary bottleneck. Orchestrating five parallel subagents cuts wall-clock time by roughly the same factor.
Specialisation matters when domain depth genuinely outweighs generalist range. A coding agent fine-tuned on your internal codebase will consistently outperform a general-purpose agent on repository-specific tasks. A compliance agent trained on financial regulations will catch issues a broader model might miss.
Context limits force the issue for long-horizon tasks. A complex research project might accumulate more context than any single model can hold. Before that threshold is reached, a single agent's effective horizon can be extended by pairing in-context short-term memory with an external long-term vector store — retrieving relevant past state on demand rather than carrying it all in the active window.1 When even this hybrid memory architecture is insufficient for the task's scope, breaking the work into phases — each handled by a fresh agent instance — is not a workaround; it is good engineering.
Verification is underused. One of the highest-value applications of a second agent is simply checking the first agent's work. A critic agent that reviews outputs before they reach a human adds a layer of reliability that is difficult to achieve through prompt engineering alone. A more structured form of this is the debate paradigm, in which multiple agents independently propose answers and then argue their positions across several rounds until converging on a consensus — an approach that has been shown to improve factual accuracy on reasoning tasks.2 A related risk runs in the other direction: in multi-agent pipelines, a hallucination generated by one agent can be accepted and amplified by others downstream, making error propagation a distinct failure mode that does not exist in single-agent systems.2
Key takeaway: Multi-agent debate is not just a verification trick — it is a structured process in which disagreement between agents actively drives output quality upward, while also introducing a new risk: one agent's error can propagate and compound through the rest of the pipeline.
Orchestration Patterns
Multi-agent systems organise themselves in four broad patterns, each with different trade-offs.
Hierarchical is the most common pattern in enterprise deployments. A central orchestrator receives the user's goal, decomposes it into subtasks, delegates each to a specialised subagent, and synthesises the results. The orchestrator handles planning and coordination; subagents handle execution. This pattern is predictable and auditable — critical for governance.
Collaborative architectures allow agents to communicate directly and negotiate on outputs. This pattern excels at tasks requiring genuine deliberation — where multiple perspectives improve the final answer — but is harder to debug when things go wrong, since causality is distributed.
Pipeline architectures are the simplest multi-agent pattern. Each agent performs a transformation on the output of the previous one: extract, then analyse, then validate, then format. These are easy to reason about, easy to monitor, and easy to modify — at the cost of no parallelism and no feedback loops.
Routing is a fourth pattern worth naming explicitly: a classifier step — itself often an LLM call — directs each incoming task to the most appropriate specialised sub-agent or prompt, rather than passing everything through the same path.3 This is particularly effective when inputs vary enough that optimising for one type would degrade performance on others, and it is among the most commonly deployed patterns in production agentic systems.3
Key takeaway: Not every multi-agent system needs full orchestration — routing lets a lightweight classifier send each task to the right specialist without the overhead of a central planner managing a full workflow.
The Hidden Costs of Multi-Agent Complexity
Every additional agent in a system adds overhead that compounds across the task lifecycle.
| Cost Type | Description | Mitigation |
|---|---|---|
| Latency | Each inter-agent handoff adds round-trip time | Parallelise where possible; set strict timeouts |
| Token cost | Agents passing context to each other re-tokenise it | Summarise before handoff; avoid full-context transfer |
| Failure propagation | A subagent failure can block the orchestrator | Design for partial failure; define fallback paths |
| Trust surface | Agents receiving instructions from other agents can be manipulated | Validate inter-agent messages; scope agent permissions |
| Observability | Debugging distributed agent behaviour is significantly harder | Instrument every handoff; log intermediate states |
The trust surface issue deserves particular attention. When an agent receives instructions from an orchestrator rather than directly from a human, the assumption of legitimacy is weaker. A compromised or misconfigured orchestrator can issue instructions that downstream agents would never receive from a human operator. This is not theoretical — it is an active area of security concern. As the number of agents in a system grows, these risks compound: research on scaling multi-agent architectures finds that communication network complexity increases with agent count, making hallucination amplification and coordination failures progressively harder to contain.4
A Decision Framework
Before committing to an architecture, work through these questions:
- Can the task be completed reliably by a single, well-scoped agent? If yes, start there.
- Does the task have genuinely independent subtasks that benefit from parallelism? If yes, a multi-agent architecture earns its cost.
- Does domain depth in any subtask justify specialisation? If yes, consider dedicated subagents for those areas.
- Does the output require independent verification before it reaches a human? If yes, build in a critic layer.
- What is the cost of a partial failure? High-stakes outputs require more isolation between agent components.
Architecture should follow risk. The more consequential the agent's actions, the more important it is that responsibilities are clearly separated and failure modes are well-understood.
References
- 1. Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W.X., Wei, Z., & Wen, J. (2024). A Survey on Large Language Model-Based Autonomous Agents. Renmin University of China.
- 2. Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., & Zhang, X. (2024). Large Language Model Based Multi-Agents: A Survey of Progress and Challenges. University of Notre Dame.
- 3. Anthropic (2025). Building Effective Agents. Anthropic.
- 4. Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., Zheng, R., Fan, X., Wang, X., Xiong, L., Zhou, Y., Wang, W., Jiang, C., Zou, Y., Liu, X., Yin, Z., Dou, S., Weng, R., Cheng, W., Zhang, Q., Qin, W., Zheng, Y., Qiu, X., Huang, X., & Gui, T. (2023). The Rise and Potential of Large Language Model Based Agents: A Survey. Fudan University.
Building agentic AI and wondering why alignment is harder than the technology? Get in touch