Chapter 17 · AI Colleagues: Deploying Agents Across HR, Finance, and IT
The most revealing test of an agentic deployment is not what it does when everything works. It is what it does when something doesn't.
The Internal Enterprise as the First Frontier
The public discourse around agentic AI concentrates on customer-facing applications — agents that serve, sell, or support external users. The internal enterprise is a quieter story, but in many organisations it is where the most consequential deployments are happening first.
The logic is straightforward. Internal deployments operate in a more controlled environment: the user population is known, the data is owned by the organisation, the risk of reputational damage from a visible failure is lower, and the feedback loops from internal users are faster and more direct than those from external customers. HR, Finance, and IT are three functions where this calculus plays out particularly clearly — and where the failure modes are distinct enough that they warrant separate treatment.
This chapter examines each function in turn: what agents are actually doing in production, what the deployment patterns look like, what the characteristic risks are, and what the organisational conditions determine whether the deployment succeeds. It concludes with the design principles that cut across all three, and a note on the change that matters most, which is rarely the one organisations spend the most time planning.
Human Resources: Where Sensitivity Meets Scale
HR is simultaneously one of the most promising and most sensitive domains for agentic AI. The promise is real: the function handles enormous volumes of repetitive administrative work — job postings, screening, scheduling, onboarding documentation, policy queries, benefits administration — that consume significant headcount without generating strategic value. The sensitivity is equally real: HR processes touch employment decisions, compensation data, health information, and the legal protections that attach to protected characteristics. A model that produces subtly biased screening outputs, surfaces compensation data it should not, or handles a disciplinary record incorrectly is not merely a technical failure — it is a compliance and legal exposure.
This exposure is no longer abstract. In the United States, the EEOC has issued technical assistance warning that employers may be responsible under Title VII when algorithmic or AI-based selection tools produce adverse impact in employment decisions.10 New York City's Local Law 144 requires bias audits and candidate or employee notices before covered automated employment decision tools are used for hiring or promotion.11 In the EU, the AI Act classifies several employment-related AI systems — including recruitment, selection, performance evaluation, and decisions affecting work relationships — as high-risk categories subject to additional obligations.12
The organisations that have deployed agents in HR most successfully have done so by treating these two dimensions as complementary rather than contradictory: they deploy aggressively in the administrative and information-retrieval tier, and apply strict human oversight at every decision point that touches an individual's employment status.
Recruitment and Screening
Recruitment is where the productivity case for agentic HR is strongest and the bias risk is most acute. An agent can process hundreds of applications, summarise candidate profiles against a structured rubric, flag missing requirements, and produce ranked shortlists in a fraction of the time a recruiter would require. The same agent, if its evaluation logic — or the training data underlying its judgements — reflects historical hiring patterns, can systematically disadvantage candidates from underrepresented groups in ways that are difficult to detect because the output looks like a neutral ranked list.
The failure mode here is not dramatic. There is no obvious error to catch. The bias expresses itself statistically, across thousands of decisions, and becomes visible only through deliberate demographic analysis of outcomes. The Amazon recruiting tool case — in which an internal AI screening system trained on a decade of hiring data learned to penalise CVs containing the word "women's" and to favour verbs more common on male engineers' résumés, such as "executed" and "captured" — came to light not through any corporate disclosure but through investigative reporting by Reuters, with sources speaking anonymously; Amazon denied the tool was ever used to evaluate candidates.1 Because the model had learned to associate the linguistic patterns of historically successful hires with fitness for the role, and those patterns reflected a male-dominated workforce, the bias was encoded in statistical correlation rather than explicit rules — invisible to any individual reviewer and detectable only in aggregate outcomes.
The practitioner response is not to avoid agentic recruitment tools. It is to treat demographic outcome auditing as a mandatory component of the evaluation infrastructure, not an optional review. The screening rubric must be explicit and inspectable. The agent must explain its ranking in terms that a recruiter can interrogate. And the human decision must remain genuinely human — not a rubber stamp on an AI shortlist, but a review that can and does override the agent's output. Where local law requires bias audits, public notices, or candidate disclosures, these controls should be treated as deployment prerequisites rather than compliance tasks completed after the tool is in use.101112
Key takeaway: In recruitment, the bias risk is not in the most visible outputs. It is in the aggregate pattern across many decisions — and it is only detectable if outcome auditing is built into the evaluation infrastructure from the start.
HR Policy and Employee Q&A
Policy Q&A is the cleanest HR use case for agentic AI: high volume, low consequence per query, well-defined information source. An agent that can answer questions about parental leave entitlements, pension contributions, expense claim procedures, and IT access request processes from a current policy document reduces the per-query cost dramatically and frees HR business partners for more complex advisory work.
The failure mode is subtler than it appears. Policy documents change. Jurisdictional variations exist. The question "am I entitled to bereavement leave for the death of a grandparent?" sounds simple but may depend on the employee's country of employment, their employment contract type, and a policy that was updated six months ago. An agent answering from a cached or outdated policy document produces wrong answers confidently. An agent that does not know the employee's jurisdiction conflates policies that should not be conflated.
The architectural requirement is strict source grounding: the agent must answer only from current, authorised policy documents, must surface the version date of the source it is drawing from, and must escalate queries that involve multiple jurisdictions or ambiguous contractual terms. The temptation to give the agent general HR knowledge alongside the policy corpus should be resisted — it creates the conditions for the agent to produce plausible-sounding answers that blend policy with general-knowledge inference in ways that cannot be reliably distinguished.
IBM's own AskHR deployment illustrates what is achievable at scale with this architecture in place: the system resolves 10.1 million employee interactions annually, saving 50,000 hours and approximately $5 million per year, with IBM Consulting's internal research estimating that self-service adoption of this kind can reduce HR service delivery costs by 50–60%.6
Onboarding
Onboarding is an application where agents provide clear, measurable value with manageable risk. New employees need access to large volumes of procedural information on a compressed timeline: systems access, compliance training, benefits enrolment, team introductions, equipment provisioning. Coordinating this across HR, IT, and line management is administratively complex and frequently inconsistent.
An agent that manages the onboarding workflow — triggering tasks in the correct sequence, sending personalised checklists, answering questions from the new employee, tracking completion, and alerting managers to blockers — reduces the coordination overhead significantly and produces a more consistent experience. The risk is low relative to recruitment or compensation: the decisions are procedural rather than consequential, the data sensitivity is moderate, and the failure modes (a task triggered late, a question answered from an outdated document) are recoverable.
Onboarding is therefore a strong Stage 2 or Stage 3 entry point for organisations building agentic HR capability — high enough value to justify the infrastructure investment, low enough risk to make the learning curve affordable.
Finance: Precision, Audit Trails, and the Limits of Automation
Finance presents a different configuration of promise and risk. The promise is in the volume and repetitiveness of financial processing work: invoice processing, expense categorisation, reconciliation, financial close support, variance commentary, and regulatory reporting all consume significant human time at relatively low value per hour. Research by the World Economic Forum and Accenture estimates that 39% of banking work time carries high potential for full AI automation — the highest of any industry — with a further 34% suited to AI augmentation, both driven by the language-heavy nature of financial operations.5 The risk is in the precision requirements and the audit dimension: financial records must be accurate, traceable, and defensible to auditors and regulators. An error in a customer service conversation is an incident. An error in a financial record is potentially a material misstatement.
The organisations that have gone furthest in agentic finance have done so by treating the audit trail as a first-class design requirement, not an afterthought. Every agent action that affects a financial record must be logged with sufficient fidelity that a human auditor can reconstruct what happened, why, and who or what authorised it.
Accounts Payable and Invoice Processing
Invoice processing is among the highest-volume, most automation-ready tasks in finance. Agents can extract structured data from unstructured invoice documents, match against purchase orders, flag discrepancies, route exceptions for human review, and prepare payment runs within defined approval thresholds. The technology for this has existed in RPA form for years; agentic AI improves it by handling the variability that trips up rule-based systems — invoices in unusual formats, line-item descriptions that do not map cleanly to a purchase order's taxonomy, supplier names that appear under multiple aliases in different systems.
The characteristic failure mode is the one described in Chapter 15's e-commerce case: the agent processes what it can see and does not check what it cannot. An invoice processing agent that lacks a price validation tool will approve invoices at changed prices. An agent that cannot access the original purchase order will approve invoices for goods that were never ordered. Scope design must follow the logic of the decision: identify every variable that a human approver would check, and ensure the agent has access to every corresponding data source.
Financial Close and Reconciliation
Monthly financial close is one of the most time-compressed, error-prone processes in finance. It requires reconciling data across multiple systems, explaining variances, preparing journal entries, and producing draft commentary for management accounts — all under a hard deadline. Agents that can automate the routine reconciliation steps, flag variances above a defined threshold for human investigation, and draft variance commentary from structured data materially reduce the close timeline and the cognitive load on finance teams.
The audit requirement here is particularly demanding. Every journal entry must be traceable to a source and an authorisation. Every automated reconciliation match must be logged with the matching logic used. Draft commentary produced by an agent must be clearly identified as AI-generated and reviewed before it becomes part of the record. These requirements are not burdensome if they are designed in from the start; they become very expensive to retrofit.
IBM's own experience illustrates the scale of potential gains: its Jobotx initiative, deployed in 2024, combined AI, RPA, and orchestration tooling to standardise and automate journal entry processing across regions, with end-to-end process automation projected to cut financial close and reconciliation cycle times by more than 90% and estimated to generate annual cost savings of approximately $600,000.8
Regulatory Reporting and Compliance Monitoring
Regulatory reporting — the production of structured returns for tax authorities, financial regulators, and statutory bodies — is high-stakes, deadline-driven, and increasingly complex as reporting requirements proliferate across jurisdictions. Agents can assist by aggregating the source data, applying the relevant reporting rules, generating draft returns, and flagging discrepancies for human review.
The risk profile is asymmetric in a way that finance teams understand intuitively: a late or incorrect regulatory filing carries legal and financial consequences that are not proportionate to the administrative error that caused them. The consequence of this asymmetry is not that agents should not be used in regulatory reporting — they clearly can add value — but that the human sign-off process must be genuinely independent of the agent's output, not a cursory review of a document the reviewer did not generate and does not fully understand.
Key takeaway: In finance, the audit trail is not just a compliance requirement — it is the mechanism through which the organisation retains the ability to reconstruct and correct agent decisions. Design it first, not last.
Expense Management
Expense management is a high-frequency, low-stakes process where agentic automation delivers consistent, measurable value. Agents can classify submitted expenses against policy, flag likely violations, request missing receipts, process compliant claims automatically, and produce summary reports for budget holders. The process involves significant judgment calls — is this business meal within policy? is this taxi fare reasonable for the stated journey? — but the consequences of individual errors are low, making it a strong candidate for Stage 3 automation with sampled human review rather than comprehensive oversight.
The nuance is in the policy interpretation. Expense policies written for human readers contain implicit assumptions that agents cannot reliably infer. "Reasonable" accommodation costs mean different things in different cities and contexts. The agent should apply explicit thresholds, not interpret qualitative policy language, and the thresholds should be set conservatively and reviewed against output quality periodically.
IT: The Function That Builds Its Own Agent Infrastructure
IT occupies a unique position in the agentic landscape. It is simultaneously a deployment target — agents that automate IT operations and service management — and the function responsible for deploying and maintaining agents across the rest of the organisation. The IT team building an HR onboarding agent is the same IT team operating the service desk agent that answers its own employees' IT questions. This dual role creates both advantages (deeper understanding of the technology) and risks (a tendency to underweight the organisational complexity of deployment in other functions).
IT Service Management
The IT service desk is the classic high-volume, low-complexity processing problem that agentic AI handles well. Password resets, access provisioning, software installation requests, hardware fault reporting, VPN troubleshooting — the majority of service desk ticket volume is highly repetitive, and the resolution paths for the most common issues are well-documented. An agent that can handle the first-line response to these tickets — classifying, providing self-service resolution guidance, and escalating to human agents when the issue exceeds its scope — reduces cost per ticket significantly and, when designed well, improves response time.
The deployment pattern that works is narrow scope with aggressive escalation. The agent handles what it knows how to handle and routes everything else to a human immediately, without attempting to resolve issues it is uncertain about. The pattern that does not work is a broad-scope agent instructed to "try to resolve" before escalating — the result is a sequence of confident wrong answers before the human handoff, which degrades the user experience relative to direct human contact.
ITSM agent deployments have produced consistent and repeatable productivity gains, concentrated in the narrow scope the agent has been explicitly designed for and not generalising to the rest of the ticket queue. Bank of America's internal deployment illustrates the achievable scale: with more than 90% of its 213,000 global employees using its AI assistant for internal support queries, the bank reduced calls to the IT service desk by more than 50% — a gain attributable precisely to keeping the tool scoped to the high-volume routine queries it could handle reliably rather than deploying it as a broad-scope resolver.9
Infrastructure Monitoring and Incident Response
Operations agents that monitor infrastructure — watching log streams, correlating alerts, identifying anomaly patterns, and generating diagnostic summaries — represent one of the highest-value ambient AI applications in the enterprise. The volume of signals generated by modern infrastructure exceeds human monitoring capacity; the correlation of signals across systems to identify root causes is exactly the kind of multi-source pattern recognition that language models do well.
The architecture for this application is almost always event-driven and ambient: agents subscribe to log and alert streams, run continuously in the background, and surface diagnostics when thresholds are crossed or anomalous patterns detected. The agent does not resolve incidents autonomously — the consequence surface is too large, and the blast radius of an incorrect automated remediation is potentially significant — but it dramatically reduces the time-to-diagnosis by presenting an engineer with a structured hypothesis rather than a raw alert.
The governance requirement is the override mechanism discussed in Chapter 7: any agent operating on production infrastructure must be immediately pausable by an on-call engineer, and the pause must take effect without disrupting the monitoring layer itself.
Security Operations
Security operations is an area of growing agentic deployment that warrants particular care. The case for it is strong: security analysts face alert volumes that far exceed human processing capacity, and the consequence of a missed alert can be severe. An agent that triages alerts, correlates indicators of compromise, queries threat intelligence feeds, and escalates genuinely suspicious activity allows analysts to focus on investigation rather than triage.
IBM's work with Pakistan's Askari Bank illustrates the achievable scope: a purpose-built, continuously running security operations centre reduced daily security incidents from roughly 700 to fewer than 20 and cut average remediation times from 30 minutes to 5 minutes — results that required both the agentic triage capability and a carefully bounded set of escalation thresholds governing what the agent would act on autonomously.
The risk is twofold. First, an agent that misclassifies a genuine threat as a false positive creates a blind spot in the organisation's defences. The evaluation infrastructure must specifically test for false-negative rates, not just overall classification accuracy. Second, a security operations agent necessarily has broad read access to sensitive systems and data. An agent that is itself compromised — through a prompt injection attack or a malicious document in the alert stream — can become a high-value reconnaissance tool for an attacker. Chapter 13 covers these attack surfaces in detail; security operations is one of the domains where that chapter's guidance is most directly applicable.
Software Development and Engineering Support
Software development was identified in Chapter 4 as the largest single category of enterprise AI spend, and in IT the deployment pattern is both the most mature and the most varied. Agents support the full development lifecycle: code completion, test generation, documentation, code review, dependency vulnerability scanning, and increasingly the autonomous resolution of well-scoped bugs and feature requests.
The productivity gains are real and well-documented. One CTO cited in Chapter 4 reported 90% of new code being AI-assisted within twelve months of deployment. The risk that has emerged more slowly in the literature is the one flagged by Wharton's longitudinal study: 43% of enterprise decision-makers worry that sustained AI use is eroding foundational skill proficiency, particularly among junior engineers who are building their capabilities through the practice of writing code, not reviewing code the agent wrote.2 This is not a reason to avoid agentic coding tools. It is a reason to design the human role in the coding workflow deliberately, ensuring that junior engineers retain sufficient generative practice to develop the judgement that code review requires.
Consequence Boundaries by Function
The three functions discussed in this chapter should not be governed by one generic automation threshold. The correct boundary is determined by the consequence of error in each function.
| Function | Lower-risk agent actions | High-risk boundary | Required control |
|---|---|---|---|
| HR | Policy Q&A, onboarding checklists, interview scheduling | Any decision affecting hiring, pay, discipline, performance evaluation, or termination | Human decision, bias audit where applicable, documented rationale |
| Finance | Expense classification, invoice extraction, variance drafting | Any action changing financial records, authorising payment, or affecting statutory reporting | Audit trail, source reconciliation, approval threshold |
| IT | Ticket triage, knowledge-base answers, diagnostic summaries | Any action changing permissions, production systems, or security posture | Least privilege, rollback, change approval, incident logging |
Key takeaway: Internal agents are not low-risk simply because they are not customer-facing. They touch employment rights, financial records, production infrastructure, and security systems — so the governance boundary must be drawn by consequence, not by whether the user is internal or external.
Design Principles Across All Three Functions
The three functions above are structurally different — their data sensitivity, consequence surfaces, regulatory environments, and user populations vary considerably. But the deployments that succeed across all three share a set of design principles that are worth stating explicitly.
Scope before capability. The first question is not "what can the agent do?" but "what should the agent be allowed to do?" The scope boundary should be defined by the consequence surface of the decisions the agent will make, not by the model's capabilities. A more capable model does not justify a wider scope if the wider scope creates unacceptable risk.
Route by consequence, not by category. The escalation logic should be designed around the consequences of an error in each decision type, not around the category of the request. A salary enquiry that involves a standard policy lookup is low consequence. A salary enquiry that requires interpreting a contractual ambiguity is high consequence. The routing logic needs to distinguish between them even if both arrive as "compensation question."
Make the human override obvious and fast. In every function, there are situations where a human needs to take control of what the agent is doing immediately — not after three menu levels, not by filing a support ticket. The mechanism for doing this should be designed with the same attention as the agent's primary workflow, because the situations in which it is needed are precisely the situations in which cognitive load is highest.
Match data access to task scope. Every data source the agent can access is a data source it can get wrong, leak accidentally, or be manipulated into misusing. The principle is minimal necessary access: the agent sees what it needs to complete its defined task and nothing more. This applies across all three functions, but the consequences differ — unintended access to compensation data is an HR and legal issue; unintended access to financial records is an audit issue; unintended access to security logs is a security issue.
Design the audit trail as a first-class deliverable. In regulated functions, the audit trail is not a logging feature added before deployment — it is a primary output of the system. It should be specified with the same rigour as the agent's functional requirements, reviewed in testing, and explicitly assessed in the pre-deployment governance checklist.
Key takeaway: The design principles that make agentic deployments succeed in HR, Finance, and IT are not function-specific. They reflect a consistent architecture of bounded scope, consequence-driven routing, accessible override, minimal access, and first-class audit — applied within the particular regulatory and sensitivity context of each function.
The Change That Actually Matters
Organisations that have deployed agents across internal functions report a consistent observation that does not appear in the deployment plans: the most significant outcome is not the time saved on the automated tasks. It is what the people doing those tasks choose to do with the time.
An HR business partner who no longer processes 200 policy queries a week does not automatically shift to higher-value advisory work. Whether they do depends on whether the role has been redesigned to create space for that work, whether they have the capability to do it, and whether the organisation's performance framework rewards it. An agent that automates accounts payable processing frees finance analysts from processing work — but the value of that freedom is realised only if the analysts have access to the analytical work that the processing time was crowding out. McKinsey's 2025 survey of nearly 2,000 enterprises identifies fundamental workflow redesign as the single factor with the strongest measured contribution to realising value from AI — organisations that have gone furthest in redesigning workflows are 2.8 times more likely than their peers to report significant enterprise-level impact.7 This is consistent with field evidence from customer support and professional-services settings: the measurable gains from generative AI are strongest when the technology is inserted into tasks where assistance fits the work and where humans understand how to use, question, and improve the output rather than merely receive it.34
This is the change management dimension of agentic deployment, covered in depth in Chapter 22. It is mentioned here because it is the dimension most consistently underinvested in HR, Finance, and IT deployments — functions that tend to be more comfortable with process design than with the organisational redesign that makes process improvement durable. The agent is the easy part. The role redesign is where the value is.
References
- Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. October 10, 2018.
- Wharton Human-AI Research & GBK Collective (2025). Accountable Acceleration: Gen AI Fast-Tracks Into the Enterprise. Wharton Human-AI Research & GBK Collective, University of Pennsylvania. October 2025.
- Brynjolfsson, E., Li, D., & Raymond, L.R. (2023). Generative AI at Work. National Bureau of Economic Research. NBER Working Paper No. 31161.
- Dell'Acqua, F., McFowland, E., Mollick, E.R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K.R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper 24-013.
- World Economic Forum & Accenture (2025). Artificial Intelligence in Financial Services. AI Governance Alliance / Transformation of Industries in the Age of AI White Paper. World Economic Forum, January 2025.
- Hayes, M. & Downie, A. (2025). AI agents in human resources. IBM Think. IBM Corporation. https://www.ibm.com/think/topics/ai-agents-in-human-resources
- McKinsey & Company (2025). The State of AI in 2025: Agents, Innovation, and Transformation. QuantumBlack, AI by McKinsey. November 2025.
- Finio, M. & Downie, A. (2025). AI agents in finance. IBM Think. IBM Corporation. https://www.ibm.com/think/topics/ai-agents-in-finance
- Bank of America (2025). AI Adoption by BofA's Global Workforce Improves Productivity, Client Service. Bank of America Newsroom. April 8, 2025. https://newsroom.bankofamerica.com/content/newsroom/press-releases/2025/04/ai-adoption-by-bofa-s-global-workforce-improves-productivity--cl.html (Previously cited as Ch.9, ref. 4; Ch.19, ref. 9.)
- U.S. Equal Employment Opportunity Commission (2023). Select Issues: Assessing Adverse Impact in Software, Algorithms, and Artificial Intelligence Used in Employment Selection Procedures Under Title VII of the Civil Rights Act of 1964. https://www.eeoc.gov/select-issues-assessing-adverse-impact-software-algorithms-and-artificial-intelligence-used
- New York City Department of Consumer and Worker Protection (2023). Automated Employment Decision Tools. https://www.nyc.gov/site/dca/about/automated-employment-decision-tools.page
- European Parliament and Council (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council (Artificial Intelligence Act). Official Journal of the European Union, L 2024/1689. https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng
Building agentic AI and wondering why alignment is harder than the technology? Get in touch