Chapter 2 · The Economics of AI: Capability, Speed, and Cost
Why the economics of running AI agents matter as much as the AI itself.
The Hidden Cost of Intelligence
Building a chatbot that answers a customer's question is a single transaction. Building an AI agent that autonomously researches a market, drafts a strategy, validates its own assumptions, and generates a final report is an entirely different proposition — and so is its cost.
As organisations move from experimenting with conversational AI to deploying agents that act, decide, and iterate across complex workflows, the question of cost stops being a technical footnote and becomes a boardroom concern.
The Agent Trilemma
Consider a scenario that is becoming familiar in enterprise AI teams. A company builds an agent to automate competitive research — it browses the web, synthesises findings, drafts a report, and checks its own conclusions before delivering a final output. In testing, it performs brilliantly. Then the monthly API bill arrives. What felt like a productivity breakthrough turns out to cost more per report than the analyst it was meant to replace. The technology worked. The economics did not.
This tension has a name. Researchers have identified what is increasingly called the agent trilemma: the simultaneous pursuit of high performance, low cost, and fast execution. Optimizing for any two of these tends to come at the expense of the third.
What makes this trilemma particularly sharp in agentic contexts is the compounding nature of model calls. Unlike a standalone chatbot interaction, an AI agent completing a sophisticated task may invoke a language model dozens of times — to plan, to research, to verify, to revise, and to summarise.
Empirical studies have found that some leading agentic systems incur costs averaging up to $3 per task, with even relatively simple queries sometimes demanding up to 40 minutes of execution time. At enterprise scale, those numbers multiply rapidly across thousands of daily tasks.
Large vs. Small: Choosing the Right Model
The traditional instinct in AI deployment has been straightforward: use the most powerful model available. But frontier models come with a cost premium that makes them impractical as the default engine for every step of every agentic workflow.
This has driven significant research interest in Small Language Models (SLMs) — typically models ranging from 1 to 20 billion parameters — optimized specifically for constrained deployment environments.
| Model Size | Typical Cost | Best For | Limitation |
|---|---|---|---|
| Frontier (100B+) | High | Complex reasoning, ambiguous tasks | Cost at scale |
| Mid-size (20–70B) | Medium | General enterprise tasks | Balance point |
| Small (1–20B) | Low | Specific, well-defined tasks | Narrow capability |
| Edge (< 1B) | Negligible | On-device, offline | Very limited |
The instinct to equate model size with capability is understandable — for most of AI's recent history, it was largely correct. But it breaks down when tasks are narrow and well-defined. One particularly striking finding from recent benchmarking challenges this assumption directly: a fine-tuned small language model achieved a 77.55% pass rate on a standard tool-use evaluation (ToolBench), significantly outperforming ChatGPT configurations that scored as low as 16–26% on the same benchmark. The result stemmed not from raw scale but from precise task alignment — the small model was trained exclusively on structured tool-calling patterns, while larger generalist models struggled with the format requirements and generated verbose responses where concise API calls were needed.
Tiered Model Routing
The most sophisticated response to the trilemma is not to choose one model and apply it universally, but to architect agent systems that intelligently route different tasks to appropriately sized models. The logic is not unlike how professional services firms have always worked: a senior partner handles the ambiguous strategic question that requires judgement and experience; a junior associate handles the structured research that requires thoroughness and time. Nobody considers this a compromise — it is simply good resource allocation. Applying the same principle to AI models is what tiered routing makes possible.
Recent research demonstrated that by calibrating model complexity to task requirements, it was possible to retain 96.7% of the performance of a leading open-source agent while reducing per-task operational costs by over 28%. Crucially, these gains came not only from model selection but from right-sizing the agent framework itself — planning depth, tool configuration, and memory design all proved significant levers, and adding complexity beyond a threshold increased costs without improving outcomes.
When Cheaper Is Not Wiser
Amid the drive toward cost optimization, an important counterargument deserves equal weight. Economic analyses of LLM deployment have found that for tasks where errors carry meaningful real-world consequences — financial decisions, medical information, legal interpretation — the calculus shifts decisively toward using the most capable model available, regardless of cost.
The reasoning is straightforward: deployment costs, even for frontier models, are typically small relative to the economic impact of a consequential mistake. A medical AI system that misinterprets a diagnostic query does not save money by running on a cheaper model — it transfers cost from the API bill to somewhere far more serious. The same logic applies to financial advice that moves capital in the wrong direction, or legal guidance that leads a business into liability it could have avoided. In these contexts, the frontier model is not an extravagance. It is insurance.
Cost optimization is not a universal goal to be pursued in isolation. It is a design variable to be calibrated against the specific risk profile of each task.
Practical Design Principles
| Principle | What It Means in Practice |
|---|---|
| Map tasks before selecting models | Categorise each workflow step by complexity and cost of error |
| Build for routing, not uniformity | Design with dynamic model routing as a first-class feature |
| Measure cost-of-pass | Quantify the full cost of a successful outcome including retries |
| Fine-tune for specificity | Targeted fine-tuning often beats general frontier models at lower cost |
| Preserve capability headroom | Retain frontier access for genuinely complex reasoning |
| Prioritise the orchestrator role | In multi-agent systems, the manager/orchestrator model is the single most influential factor on overall team performance — allocate capability budget here first |
The organisations that navigate this well will not necessarily be those with the largest AI budgets. They will be those that learn to treat cost as a design constraint from the beginning — not an afterthought to be optimised once something is already built and running. Getting model selection, routing logic, and framework design right before scaling is far cheaper than unpicking them after the fact.
The economics of AI agents, in other words, reward the same discipline that good engineering has always rewarded: thinking carefully before building, and building only what the problem actually requires.
References
- Jhandi, P., Kazi, O., Subramanian, S., & Sendas, N. (2024). Small Language Models for Efficient Agentic Tool Calling. Amazon Web Services.
- OPPO AI Agent Team (2025). Efficient Agents: Building Effective Agents While Reducing Cost. OPPO Research Institute.
- Zellinger, M.J., & Thomson, M. (2025). Economic Evaluation of Large Language Models. California Institute of Technology.
- Sabbatella, A. (2025). MALBO: Optimizing LLM-Based Multi-Agent Teams. University of Milano-Bicocca.
- Sharma, R., & Mehta, M. (2025). Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade-offs. Northeastern University / University of Southern California.
Building agentic AI and wondering why alignment is harder than the technology? Get in touch