Chapter 11 · The Plumbing Problem: Technical Realities of System Integration
The architecture diagram shows how the pieces connect. This chapter explains what happens when they do.
What Architecture Diagrams Leave Out
There is a gap between the clean box-and-arrow diagrams that describe an agentic system and the engineering reality of making one work reliably. The boxes are easy to label: orchestrator, subagent, tool, database, external API. The arrows between them look simple. What they conceal is a class of problems — authentication flows, serialisation formats, error propagation, latency budgets, security boundaries — that consume the majority of integration engineering time.
These are not glamorous problems. They do not appear in product demos. They are also the problems that most frequently determine whether an agentic system reaches production — and stays there.
Authentication and Authorisation at Scale
Authentication is the first technical problem agentic systems hit when connecting to real enterprise infrastructure. The problem is structural: most enterprise authentication systems were designed around human users, not processes.
The credential management problem. An agent that needs to access ten enterprise systems needs credentials for ten systems. Where do those credentials live? Who rotates them? What happens when they expire mid-task? These questions have clear answers in human-operated systems — a person notices when their password expires and resets it. In an agentic system operating at scale and potentially overnight, a credential expiry can silently fail a workflow in ways that are difficult to detect.
The identity problem. Many enterprise systems are not designed to grant access to a non-human principal. The agent needs to appear as some identity — a service account, a delegated user, a role — and the access controls on that identity need to be scoped appropriately. Over-permissioning is the most common pattern (giving the agent admin access because it is easier) and the most dangerous one.
Recommended patterns:
Use a secrets manager rather than embedding credentials in prompts or environment variables. Grant agents the minimum permissions required for the task, not the maximum available. Log every credentialed action for audit trail purposes.
Tool Calling and Protocol Standards
The mechanism by which an agent invokes external tools is one of the most consequential architectural choices in an agentic system, with direct implications for reliability, security, and portability.
Function calling is the foundation — the model identifies when a tool should be used, generates a structured JSON payload describing the call, and the agent runtime executes it and passes the result back into the context. The reliability of this mechanism depends heavily on how well tools are described, how the model handles ambiguous inputs, and how errors in tool execution are surfaced back to the model.
Model Context Protocol (MCP) extends this into a standardised server-client architecture. An MCP server exposes a set of tools through a well-defined interface; an MCP client (the agent runtime) discovers and invokes them. This provides several advantages over ad-hoc function calling implementations:
| Property | Ad-hoc Function Calling | MCP |
|---|---|---|
| Tool reusability | Low — tied to specific implementation | High — MCP server is portable |
| Discovery | Manual — tools must be registered explicitly | Automatic — client discovers available tools |
| Versioning | Unspecified | Defined in protocol |
| Security boundary | Depends on implementation | Explicit server-level isolation |
| Cross-model compatibility | Limited | By design |
For new integrations, implementing tools as MCP servers is the more sustainable architecture. For existing integrations, wrapping them with an MCP-compatible interface when resources allow is a worthwhile investment.
Error Propagation in Chained Agent Calls
Error handling in single-turn systems is straightforward: the call fails, you handle the exception, you log the error. In multi-step agentic workflows, errors have the additional property of propagation — a failure at step three of ten affects the validity of steps four through ten, and the agent may or may not recognise that its working context is now compromised.
Key takeaway: The specific danger in chained agent workflows isn't a loud failure — it's a silent one: an undetected error at step three propagates forward, producing final output that appears structurally valid but is built on corrupted state.
Three error handling principles are particularly important in agentic contexts:
Fail fast and explicitly. A tool that encounters an error should return a structured error response that the agent can interpret, not a generic exception that the agent might misinterpret or ignore. The error response should describe what failed, what the agent cannot now assume, and (where possible) what the agent should do instead.
Design for partial success. In long-horizon tasks, not every step failing should abort the whole task. Define which steps are critical-path (failure aborts) and which are optional (failure degrades but does not abort). This requires deliberate task decomposition, not just error handling code.
Validate intermediate outputs. In high-stakes workflows, add validation checkpoints between agent steps. A lightweight validation layer that checks whether an intermediate output meets expected structural constraints can catch propagation errors before they compound.
Observability: What You Need and Why
Debugging a deterministic system is hard. Debugging a non-deterministic system — one where the same input can produce different outputs, and where the intermediate steps are in natural language rather than structured code — requires purpose-built observability infrastructure.
Key takeaway: Standard application logging assumes deterministic, code-legible intermediate steps. Agentic systems have neither — purpose-built observability infrastructure is the only way to reconstruct what an agent actually did after the fact.
The minimum observability stack for a production agentic system:
Trace logging — every agent invocation, tool call, and model call logged with inputs, outputs, timing, and token counts. This is the equivalent of application logs in traditional systems, and it is the only reliable way to reconstruct what an agent actually did when something goes wrong.
Span correlation — trace IDs that propagate across an entire task, so that every step of a multi-agent workflow can be correlated back to a single user request. Without this, debugging distributed agent failures is close to impossible.
Latency distribution monitoring — not just average latency, but p95 and p99 latency. Agentic systems have long-tail latency distributions because some task paths are much longer than others. The average is misleading.
Cost monitoring — token consumption per task, per agent, per tool. In a multi-agent system where different steps use different models, total cost is not visible without explicit aggregation.
Error rate by tool — which tools are failing, how often, and with what error types. A tool that succeeds 98% of the time appears reliable in average metrics; it appears concerning when you see that it fails during the 2% of tasks that matter most.
Tools like LangSmith, Weights & Biases, and Langfuse provide much of this infrastructure out of the box for LangChain-based systems. For custom implementations, OpenTelemetry provides a vendor-neutral foundation.4
Latency Budgets
Latency in agentic systems compounds across steps. A workflow that makes ten model calls, each taking two seconds on average, has a baseline latency of twenty seconds before any tool execution time is included. For real-time or near-real-time applications, this is often unacceptable.
Latency budget management requires explicit planning:
| Component | Typical Range | Optimisation Lever |
|---|---|---|
| Model inference (frontier) | 2–15s per call | Use smaller model where quality permits |
| Model inference (small) | 0.2–2s per call | Route simpler steps here by default |
| Tool execution | 50ms–10s | Cache results; parallelize where possible |
| Inter-agent handoff | 100ms–2s | Minimise context serialisation; use streaming |
| Total workflow (10 steps) | 20s–3min | Parallelize independent steps; reduce step count |
The single highest-leverage optimisation in most agentic systems is parallelising independent steps. If three research subtasks can run simultaneously rather than sequentially, the workflow time drops by a factor of three for those steps. This requires explicit task decomposition and a orchestration layer that supports concurrent execution.
Security Attack Surfaces
Agentic systems introduce attack surfaces that do not exist in traditional application architectures.
Prompt injection is the most widely discussed. A malicious payload embedded in data that the agent reads — an email, a web page, a document — can instruct the agent to take actions not intended by the legitimate user or operator. The challenge is that the agent cannot always distinguish between instructions from its operator and instructions embedded in its environment. Empirical testing found that more capable models are significantly more vulnerable to injection attacks: the same instruction-following ability that makes a model useful applies equally to adversarially injected commands, making architectural controls a more reliable mitigation layer than model-level defences alone.2
Tool abuse occurs when an agent is manipulated into using tools in unintended ways — exfiltrating data through a communication tool, escalating permissions through an administrative tool, or chaining tool calls to achieve an outcome the agent was not designed to enable.
Cross-agent trust exploitation in multi-agent systems, where a compromised subagent issues instructions to other agents that those agents treat as legitimate.
Mitigations:
- Scope agent permissions to the minimum required for legitimate tasks.
- Treat all content from the environment (emails, documents, web pages) as untrusted input.
- Validate tool call parameters before execution, particularly for write operations.
- Log all agent actions and flag unusual patterns for human review.
- Apply the same security review to agent workflows as to any application code.
Security in agentic systems cannot be bolted on after the fact. The attack surfaces are deeply architectural, and the mitigations must be designed in from the start.
References
- OWASP (2025). OWASP Top 10 for Large Language Model Applications. Open Web Application Security Project.
- Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques for Language Models. ML Safety Workshop, NeurIPS 2022. AE Studio.
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media.
- OpenTelemetry Authors (2023). OpenTelemetry Specification. Cloud Native Computing Foundation.
Building agentic AI and wondering why alignment is harder than the technology? Get in touch