Chapter 8 · The Build-vs-Buy Decision in an Agentic World
A framework for making the decision once — and revisiting it when you should.
Why This Decision Is Harder Than It Looks
The build-vs-buy question has always been present in enterprise software. What makes it particularly difficult in agentic AI is the combination of four factors that rarely appear together: rapid capability improvement, fragmented tooling, high switching costs, and genuine uncertainty about what "best" even means for your specific use case.
A vendor solution that covers 80% of your needs today might cover 95% in six months — or might be discontinued. An internal build that perfectly fits your requirements might be rendered obsolete by a model update that changes the economics of the entire approach. You are making a capital allocation decision in a market that is moving fast enough to invalidate your assumptions before the decision is fully implemented.
Gartner's 2025 Hype Cycle places AI agents at the Peak of Inflated Expectations, explicitly flagging that rapidly changing model and tooling options are making it difficult for organisations to define a stable roadmap — a structural condition that sits directly beneath every build-vs-buy decision in this space.2
The market is early: a 2025 McKinsey survey of nearly 2,000 executives found that while 62% of organisations are experimenting with AI agents, only 23% are scaling them anywhere in the enterprise — and in no individual business function are more than 10% of organisations at the scaling stage.3
Key takeaway: Most organisations are still in the experimentation phase with agentic AI, which means any build-vs-buy decision is being made before the market has reached a stable, evaluable state.
This does not mean the decision is arbitrary. It means the decision needs to be structured around durable principles rather than current feature lists.
The Decision Matrix
The build-vs-buy decision in agentic AI is better understood as four distinct sub-decisions, each of which can be made independently.
Competitive differentiation is the decisive question. If the agentic capability you are building is core to your product or competitive position — a proprietary customer experience, a unique analytical capability, a workflow that encodes genuine institutional knowledge — building it is not just justifiable, it is strategically necessary. Giving a vendor visibility into that workflow or depending on them for its reliability is a strategic risk.
If the capability is operational rather than strategic — expense report automation, IT ticket triage, meeting note summarisation — the build case weakens considerably.
The True Cost of Building
Teams systematically underestimate the cost of building and maintaining agentic systems. The underestimation is not in the initial development — it is in the ongoing operational overhead.
| Cost Category | What Teams Often Miss |
|---|---|
| Prompt engineering | Ongoing maintenance as model versions update |
| Evaluation infrastructure | Building and running evals to catch regressions |
| Observability tooling | Logging, tracing, and alerting for non-deterministic systems |
| Safety and guardrails | Developing and maintaining content filters, scope controls |
| Model management | Managing multiple model versions, A/B testing, rollbacks |
| Incident response | Debugging failures in systems that are hard to reproduce |
The difference between a proof-of-concept agentic system and a production-grade one is typically 3–5x the development effort of the initial build. Teams that plan for the POC and then discover the production gap mid-deployment are a common failure pattern.
The question is not "how much does it cost to build?" but "how much does it cost to maintain, improve, and operate over three years?"
The Hidden Risks of Buying
The case for buying is equally subject to underestimation — but of risk rather than cost.
Vendor lock-in at the capability layer is the most significant risk. When a vendor's agent handles your customer interactions, the institutional knowledge about how those interactions should work lives in their configuration, not yours. If you need to switch vendors — because pricing changes, because the vendor is acquired, because a competitor offers better capability — you may find that the accumulated tuning of your agent is not portable. This risk is borne out in practice: a 2025 survey of 100 enterprise CIOs found that agentic workflows have measurably increased model switching costs, with leaders reporting that prompt tuning for multi-step agent tasks is so workflow-specific that migrating to a different model can consume significant engineering time.1
Data exposure is the second major risk. Agentic systems often need access to sensitive data to be useful. A vendor-hosted agent that processes HR data, financial records, or customer information requires careful contractual and technical controls that are harder to enforce in practice than on paper.
Dependency on a single model's behaviour is underappreciated. Vendors that host models update them — and model updates change agent behaviour in ways that are difficult to predict or test for before the change is live. A model that answers your customers' questions one way today may answer them differently in three months.
Hybrid Architectures
For most organisations, the answer is not a binary choice but a layered architecture that buys commodity capability and builds differentiated capability.
In this model, you buy access to foundation models and orchestration infrastructure (the commodity layer), while building the components that encode your specific business logic: the tools your agents use, the data they access, the guardrails that define acceptable behaviour, and the evaluation systems that tell you when something has gone wrong.
This hybrid approach balances speed (vendor commodity capability is available now) with control (your differentiated layer is yours to own and improve).
The Two Orchestration Tiers: A Critical Distinction
The orchestration framework layer — the "Buy" box in the diagram above — is not a single homogeneous market. It has stratified into two tiers with materially different build-vs-buy implications, and treating them as equivalent is a common source of poor platform decisions.
Tier 1: Code-first developer frameworks (LangGraph, Microsoft Agent Framework, CrewAI, OpenAI Agents SDK) require sustained software engineering investment to operate, but deliver the depth, state management, observability, and control that production-grade agent systems demand. For this tier, "buying" the framework means accepting its abstraction model and its lock-in. Teams should evaluate whether the orchestration logic they are encoding is portable — whether the workflow rather than the framework implementation is the durable asset.
Tier 2: Visual and low-code platforms (n8n, Dify) serve a fundamentally different profile: technically capable teams who need to orchestrate agents across business processes without maintaining a full-stack AI engineering function. For this tier, "buying" the platform means trading depth for speed and organisational reach — non-engineers can participate in building and maintaining workflows, and deployment cycles are measured in days rather than sprints. The ceiling is lower, and the lock-in is different in character: workflow definitions tend to be more portable than compiled code, but vendor-specific node configurations create their own migration friction.
The decision question is therefore not just "build or buy orchestration" but "which tier of the orchestration market fits our engineering profile, and what are we building on top of it?" A team choosing a low-code platform for operational automation while maintaining a code-first framework for strategically differentiated workflows is not making a contradiction — it is making a rational tier assignment. Both are covered in detail in Chapter 6.
A Framework for the Decision
Market data supports the buy direction for most operational use cases: a 2025 Menlo Ventures survey of 495 enterprise decision-makers found that 76% of AI use cases are now purchased rather than built internally, up from 53% the prior year — a shift driven by the maturing application ecosystem rather than reduced confidence in internal teams.4 Enterprise budget data complicates this picture, however: IT-function leaders report allocating roughly 30% of their Gen AI technology budgets to internal R&D, a pattern researchers interpret as evidence that firms are simultaneously building proprietary capabilities rather than relying entirely on vendor solutions.5
Use this framework as a structured starting point, not a definitive formula:
| Dimension | Build Indicators | Buy Indicators |
|---|---|---|
| Strategic value | Core to competitive position | Operational, non-differentiating |
| Data sensitivity | Highly sensitive, regulatory constraints | Standard, low-risk data |
| Customisation depth | Deep domain specificity required | General-purpose use case |
| Engineering capacity | Strong ML/AI team in place | Limited AI engineering capacity — consider Tier 2 low-code platforms (n8n, Dify) as a middle path before full vendor commitment |
| Speed requirements | Time allows for proper build | Need capability now |
| Vendor market maturity | Market immature or fragmented | Strong vendor options exist |
| Switching cost tolerance | Low tolerance for dependency | Acceptable to depend on vendor |
| Orchestration tier fit | Complex, stateful workflows requiring code-first control | Business process automation suited to visual/low-code tier |
Revisit this decision at least annually. The agentic infrastructure market is evolving fast enough that the right answer in 2024 may be the wrong answer in 2026 — in either direction.
When to Revisit the Decision
Three triggers should prompt a reassessment:
- A vendor releases a capability that matches 90%+ of your build — the economics of maintaining a build shift significantly.
- Your custom build becomes a maintenance burden — if more than 20% of your AI engineering time is maintaining existing agentic systems rather than building new ones, the cost calculus has changed.
- A major model or platform shift changes the foundation — transitions like the GPT-3 to GPT-4 generation, the arrival of open-weight frontier models, or governance shifts such as MCP and A2A moving to Linux Foundation stewardship have historically been moments where many build decisions should have been reconsidered.
References
- Andreessen Horowitz (2025). How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025. Andreessen Horowitz. June 2025.
- Gartner (2025). Hype Cycle for Artificial Intelligence, 2025 (ID: G00828523). Gartner, Inc. June 2025.
- McKinsey & Company (2025). The State of AI in 2025: Agents, Innovation, and Transformation. QuantumBlack, AI by McKinsey. November 2025.
- Menlo Ventures (2025). 2025: The State of Generative AI in the Enterprise. Menlo Ventures. December 2025.
- Wharton Human-AI Research & GBK Collective (2025). Accountable Acceleration: Gen AI Fast-Tracks Into the Enterprise. Wharton Human-AI Research & GBK Collective, University of Pennsylvania. October 2025.
Building agentic AI and wondering why alignment is harder than the technology? Get in touch