Chapter 23 · Building the Team: People, Partners, and Platforms

The organisations that will sustain agentic AI capability are not necessarily the ones that built the best first agent. They are the ones that built the right team around it.

The Capability Question

Every technical chapter in this book describes decisions that require specific human capability to make well: the choice between single-agent and multi-agent architectures, the design of evaluation infrastructure, the calibration of oversight mechanisms, the construction of audit trails, the assessment of adversarial risks. These decisions require people who understand both the technology and the organisational context in which it is operating.

Finding those people, structuring the team they belong to, and sustaining the capability over time is a challenge that most enterprise AI programmes underestimate. The point is not merely staffing volume: enterprise survey evidence consistently shows that AI high performers differentiate themselves through workflow redesign, governance, talent practices, and operating model choices as much as through model access.¹ The technology is available to anyone with an API key. The organisational capability to use it well is not.

This chapter addresses the talent and organisational design questions that determine whether an agentic programme can be built and sustained — not just launched.

The Roles That Actually Matter

Enterprise AI job titles have proliferated to the point of meaninglessness. "AI Lead," "Head of Intelligent Automation," "Prompt Engineer" — each organisation has invented its own nomenclature for functions that differ significantly in scope and capability requirements.

Underneath the titles, a small number of functional roles determine whether an agentic programme succeeds. These are not all full-time positions, particularly in early-stage programmes, but each must be covered by someone:

The evaluation engineer is responsible for building and maintaining the test infrastructure that determines whether agents are performing correctly. This is a specialised role that requires the ability to think like an adversary — to construct the inputs that reveal failure modes rather than those that demonstrate success. Evaluation engineers are not testers in the traditional sense; they are the people who define what "working correctly" means and build the systems that measure it. In mature programmes, this is typically a full-time role. In early programmes, it is often underweighted or absorbed by the agent developer, which produces evaluation infrastructure that is optimised to confirm success rather than find failure.

The agent operations manager owns the running system: the monitoring dashboards, the incident response procedures, the model update assessments, the continuous improvement cadence. This is an operational role that requires systems thinking, process discipline, and the confidence to say that an agent needs to be taken offline when performance is degrading — regardless of the pressure to keep it running. This role is often the last to be filled and the first to be asked to absorb additional scope.

The AI product manager sits between the business and the engineering team. They translate business requirements into agent specifications, manage the scope boundaries that prevent scope creep, and own the relationship between the agent's capability and the business value it is expected to deliver. Without this role, agent programmes are either driven by what the technology can do (producing capability without clear value) or by what the business wants (producing requirements that the technology cannot meet).

The prompt architect is responsible for the system prompt design, tool configuration, and agent behaviour specification that determine how the agent performs across the range of inputs it encounters. This is a more specialised role than it initially appears. The difference between a system prompt that produces reliable, predictable agent behaviour and one that produces variable, edge-case-prone behaviour is substantial, and it requires both technical understanding and domain knowledge to design.

The governance and compliance function owns the accountability framework described in Chapters 21, 22, and 24: the metrics cadence, the audit trail, the regulatory compliance monitoring, the ethics review. This function is often distributed across the legal, risk, and IT teams in early programmes, which means nobody owns it fully. Mature programmes designate a specific owner.

Key takeaway: The functional roles that determine whether an agentic programme succeeds are not interchangeable with general software engineering or product management roles. Each requires specific capabilities that differ materially from adjacent functions and are in scarce supply in most organisations.

The Talent Market Reality

AI engineering talent is genuinely scarce, and the scarcity is not evenly distributed. The Stanford AI Index similarly documents sustained growth in AI job postings, private investment, and organisational adoption, reinforcing that demand for experienced AI practitioners is structurally broad rather than confined to frontier labs.⁴ ManpowerGroup's 2026 Global Talent Shortage Survey of over 39,000 employers across 41 countries found AI skills to be the hardest to hire for globally for the first time, ahead of engineering and IT.² The World Economic Forum's Future of Jobs Report projects AI specialists as the fastest-growing occupational category, with demand increasing faster than educational pipelines can supply qualified practitioners.³ Organisations with established technology brands, competitive compensation, and interesting technical challenges attract the engineers who can build sophisticated agentic systems. Organisations without these characteristics compete for a smaller pool of candidates with less experience, and they lose them to better-positioned competitors more quickly.

This reality shapes what different organisations can realistically build internally:

Technology-intensive organisations — those with strong engineering cultures, competitive total compensation, and a track record of AI development — can build significant internal capability. They can attract evaluation engineers, AI product managers, and prompt architects with deep experience, and they can retain them by offering technically interesting work on programmes of meaningful scale.

Mid-market organisations — those with functional technology teams but without the brand or compensation structures to compete for top-tier AI talent — can build operational capability: the agent operations managers, the governance functions, the business-facing roles. They typically cannot build the deep research and evaluation capability that cutting-edge agentic systems require. Their realistic path is to buy or partner for the technical layer and build internally for the operational and governance layer.

Organisations without established technology functions — those that have historically outsourced IT or whose technology teams are focused on system maintenance rather than development — face the hardest talent challenge. For these organisations, the build-vs-buy calculus described in Chapter 8 should lean heavily toward buying and partnering, with internal investment concentrated on the governance, operations, and business-facing functions that vendors cannot provide.

The talent market also changes the economics of retention. AI engineers with production experience in agentic systems are in high demand. An organisation that invests in developing this capability faces a retention challenge that is structurally different from other technology talent: the more experience an engineer accumulates in agentic AI, the more attractive they become to competitors. Retention strategies that worked for conventional software engineering — competitive salary, interesting projects, clear career paths — are necessary but not sufficient.

Make, Buy, or Partner

The make-vs-buy decision for AI capability, addressed in Chapter 8 at the system level, applies equally at the capability level: which elements of the agentic programme should the organisation build in-house, which should it purchase from vendors, and which should it access through partnerships?

The answer varies by capability type. Current enterprise spending patterns support this layered view: market data shows enterprises shifting rapidly toward purchased AI applications and platforms while still retaining internal investment for the capabilities that encode organisational context and competitive differentiation.⁸

The answer varies by capability type:

Foundation models and model APIs are almost universally purchased rather than built. Building a competitive foundation model is beyond the resources of all but a handful of organisations globally. Accessing model capability through APIs from established providers is the correct decision for the overwhelming majority of enterprises.

Evaluation and governance infrastructure is an area where the build-vs-buy calculus is genuinely mixed. Commercial evaluation platforms offer substantial acceleration for organisations without the in-house capability to build evaluation infrastructure from scratch. They also create dependency on vendor roadmaps and pricing. Organisations with the capability to build should consider the strategic value of owning their evaluation infrastructure — it is the system that defines what "good" means for their agents, and outsourcing that definition has long-term implications.

Domain-specific customisation — the prompt engineering, fine-tuning, and workflow design that tailors foundation model capability to a specific business process — is almost always built internally. No vendor can build what an organisation's own people know about their specific processes, customers, and edge cases.

Most mature programmes converge on a federated model: a central AI platform function that owns standards, tooling, and the evaluation and governance infrastructure, combined with embedded practitioners in each major business unit who own the agents for their domain. This mirrors the buying patterns described by enterprise CIOs: organisations increasingly buy commodity model and platform capabilities while reserving internal effort for workflow-specific differentiation, data orchestration, and governance.⁵ The central function provides the foundation; the embedded practitioners build on it.

The federated model requires a clear definition of what is central and what is distributed:

Central: model access and API management, evaluation framework and tooling, security and compliance standards, audit trail infrastructure, monitoring platforms, adversarial testing capability, shared integration libraries.

Distributed: domain-specific agent design and configuration, business-unit evaluation datasets and test cases, local oversight processes, business unit accountability.

Boundary cases that require explicit governance: scope decisions for new agents, decisions about increasing agent autonomy, responses to incidents, exceptions to central standards.

Without explicit governance of the boundary cases, the federated model drifts: either the central team accumulates authority it cannot effectively exercise across the full portfolio, or the business units operate autonomously in ways that undermine the consistency and governance that the central function is supposed to provide.

The Learning Organisation Problem

Agentic AI capability compounds for teams that learn. A team that runs systematic experiments, documents what works and what fails, shares that knowledge across the programme, and builds on accumulated institutional knowledge will consistently outperform a team of equivalent individual talent that does not.

Most organisational structures work against this kind of compounding. They also create hidden technical debt when ad hoc scripts, prompts, evaluation sets, and integrations accumulate without shared ownership — a pattern analogous to the technical debt Sculley et al. identified in production machine-learning systems, where glue code, configuration complexity, and undeclared dependencies become long-term liabilities.⁷ Knowledge accumulated by one team is not systematically captured or shared. Lessons from production failures are documented in incident reports that nobody reads. The tacit knowledge of experienced practitioners walks out the door when they leave.

Building the organisational structures that enable compounding requires deliberate investment:

Structured learning reviews. Senge's foundational analysis of learning organisations identifies systems thinking and personal mastery as the disciplines that allow teams to build institutional knowledge rather than merely institutional memory — a distinction that applies directly to agentic AI programmes, where tacit deployment knowledge is the primary source of compounding capability.⁶ After each significant incident, each model update assessment, and each adversarial testing cycle, a structured review that captures not just what happened but why — and what it implies for future design decisions — produces institutional knowledge rather than institutional memory.

Internal knowledge sharing mechanisms. The evaluation datasets, prompt design patterns, integration approaches, and governance practices that work are valuable to every team building agents in the organisation. Without a mechanism for sharing them — an internal practice community, a shared documentation system, regular cross-team reviews — each team rediscovers the same lessons independently.

Deliberate career pathways. Practitioners who develop deep expertise in agentic AI systems need career pathways that recognise and reward that expertise. Organisations that treat AI engineering as undifferentiated software engineering, or that expect AI practitioners to rotate to other domains after two years, will consistently lose their accumulated capability to competitors who invest in retaining it.

The Leadership Capability Gap

The executives responsible for AI programmes — typically CIOs, CTOs, and CDOs — are often in a structurally difficult position. They are accountable for programmes whose technical details they do not fully understand, making decisions about investments and risks that require technical judgment they have not developed.

This gap is not unique to AI. Senior technology leaders have always needed to make decisions about technologies they do not fully understand. But the pace of AI development, the significance of the decisions involved, and the difficulty of separating genuine capability claims from demonstration-ready prototypes make the gap more consequential than it has typically been for previous technology waves.

The response is not to require senior executives to become AI engineers. It is to develop specific capabilities that allow non-technical executives to exercise effective governance over technical programmes:

Discrimination between benchmark performance and production reliability. The ability to ask the right questions about how performance claims were generated, what the evaluation methodology was, and what the gap between benchmark performance and production performance typically looks like in practice.

Understanding of failure modes and governance requirements. A working knowledge of the failure modes described in Chapters 12–14 and the governance requirements described in Chapters 21, 22, and 24 — not at an engineering level, but at the level needed to assess whether the programme has the right infrastructure in place.

Comfort with genuine uncertainty. The ability to make reasonable resource allocation decisions under genuine uncertainty about the technology's capabilities, without being manipulated by either excessive optimism (promising more than the technology can deliver) or excessive caution (treating uncertainty as a reason to avoid investment).

The leadership capability gap is not filled by workshops or briefings. It is filled by sustained engagement with the programme — reviewing production data, participating in governance reviews, and developing the pattern recognition that comes from watching real deployments succeed and fail.

Key takeaway: Sustainable agentic AI capability requires the same rigour in team building that it requires in system building. The organisations that invest in the people and structures described in this chapter will compound their advantage; those that treat talent as a transactional input will find their capability dissipating as fast as they build it.

References

McKinsey & Company (2025). The State of AI in 2025: Agents, Innovation, and Transformation. QuantumBlack, AI by McKinsey. November 2025.
ManpowerGroup (2026). Global Talent Shortage Survey 2026. ManpowerGroup. Survey of 39,063 employers across 41 countries.
World Economic Forum (2025). Future of Jobs Report 2025. World Economic Forum, Centre for New Economy and Society. January 2025.
Stanford Human-Centered Artificial Intelligence (2025). AI Index Report 2025. Stanford HAI. https://aiindex.stanford.edu/report/
Wang, S., Xu, S., Kahl, J. and Erten, T. (2025). How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025. Andreessen Horowitz. June 2025.
Senge, P.M. (1990). The Fifth Discipline: The Art and Practice of the Learning Organization. Doubleday/Currency.
Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J., & Dennison, D. (2015). Hidden Technical Debt in Machine Learning Systems. Advances in Neural Information Processing Systems 28 (NIPS 2015), pp. 2503–2511.
Menlo Ventures (2025). State of Generative AI in the Enterprise 2025. Menlo Ventures.

Building agentic AI and wondering why alignment is harder than the technology? Get in touch

The Capability Question​

The Roles That Actually Matter​

The Talent Market Reality​

Make, Buy, or Partner​

The Learning Organisation Problem​

The Leadership Capability Gap​

References​