Chapter 14 · Disinformation at Machine Speed: How Agents Can Mislead
The problem is not that machines can now lie. It is that they can lie at a cost approaching zero, at a volume approaching unlimited, and with a fluency that makes the lie indistinguishable from the truth.
1. The Cost Collapse
Disinformation — the deliberate creation and distribution of false or misleading content — is not a new problem. Political propaganda, forged documents, and manufactured rumours predate the internet by centuries. What changes in each technological era is not the intention but the economics: the cost per unit of convincing false content, the speed of distribution, and the scale at which influence operations can be run.
The printing press reduced the cost of distributing ideas by orders of magnitude. Broadcasting reduced it further and added immediacy. The social web added algorithmic amplification. Each transition expanded who could run an influence operation, how quickly, and at what reach. Generative AI and agentic systems represent another such transition — and the magnitude of the shift in economics is larger than in any previous step.
Before large language models, producing convincing text at scale required either human labour (writers, translators, persona operators) or visible automation that was easy to detect. Coordinated inauthentic behaviour — the industry term for networks of fake accounts operating in concert — was characterised by low-quality, repetitive content that trained observers could identify. The bottleneck was not distribution; it was production. Convincing, varied, contextually appropriate content took human time.
That bottleneck is gone. A single agent can produce thousands of distinct, contextually appropriate, fluent pieces of content per hour. It can maintain dozens of synthetic personas simultaneously, each with a coherent history and a consistent voice. It can translate content into thirty languages without degradation. It can adapt messaging to specific communities, news cycles, and emotional registers. And it can do all of this at a marginal cost that is effectively zero beyond the model API call.
This chapter examines what this cost collapse means for the disinformation landscape — not as an abstract societal concern, but as a practical risk category for organisations deploying agentic systems, and as a responsibility that deployers of these systems carry by virtue of what they are releasing into a shared information environment.
2. Three Production Mechanisms
Agentic disinformation is not a single phenomenon. It operates through three distinct mechanisms, each with different characteristics and different mitigation approaches.
2a. Synthetic Content Generation
The most direct mechanism: agents are used to produce false or misleading content at scale — fabricated news articles, fake academic citations, invented statistics, synthetic expert quotes, and manufactured eyewitness accounts. The content is designed to be persuasive to a target audience and is distributed through channels that make its synthetic origin difficult to detect.
Research examining the intersection of generative models and influence operations found that language models are already capable of producing content that is persuasive across a range of influence operation tasks — generating political messaging, producing persona-consistent social media posts, and tailoring content to specific demographic and ideological profiles, with substantially reduced human oversight compared to fully manual operations — though effective deployment typically still involves human review of outputs as part of the production pipeline. The same research found that the most significant barrier to this capability was not technical sophistication but access to capable models — a barrier that has declined substantially as model capabilities have improved and access costs have fallen.1
The content quality problem is acute. Early generative text was detectable through statistical patterns — repetitive phrasing, inconsistent style, implausible specificity. Current frontier models produce text whose statistical properties are largely indistinguishable from human writing by automated classifiers and, in many cases, by careful human readers. The shift from "detectable at scale" to "undetectable at scale" is not a marginal improvement; it is a phase transition in what influence operations can accomplish.
2b. Persona Networks and Coordinated Amplification
Synthetic content becomes far more effective when it appears to come from multiple independent sources. A single article claiming a fabricated statistic is a claim; the same claim appearing across twenty apparently independent social media accounts, regional news outlets, and forum discussions becomes a consensus. Agents make it operationally tractable to maintain networks of synthetic personas at a scale that was previously limited by human staffing.
A synthetic persona network operated by agents can maintain posting histories that span months, engage authentically with real users, respond to developing news cycles in real time, and coordinate to amplify specific messages across platforms — all without the repetition and coordination signatures that characterise human-operated networks. The detection methods developed to identify coordinated inauthentic behaviour (timing correlations, shared content fingerprints, network topology analysis) were calibrated against human-operated networks. Agent-operated networks can be specifically designed to avoid these signatures. A key reason is the elimination of what disinformation researchers term "copypasta" — the repeated or near-identical text that earlier bot networks relied on, and that platform detection systems had been explicitly calibrated to catch. Language models generate semantically distinct content for every output while maintaining narrative coherence, rendering shared-content fingerprinting largely ineffective.1
2c. Narrative Laundering
The third mechanism is the most structurally novel and the one most specific to the agentic era. Narrative laundering is the process by which fabricated content gains false authority through the summarisation and citation practices of AI systems.
The mechanism works as follows: a synthetic claim is seeded — in a forum post, a fake preprint, a manufactured quote — and is then retrieved by an AI summarisation or research agent working on a related topic. That agent, which typically retrieves content based on relevance rather than verifying provenance, includes the claim in its output. The output may then be cited by another system or used as context for a third. Each step in this chain adds a layer of apparent legitimacy to a claim that was fabricated at the source.
This is not a hypothetical risk. The training corpora for language models already contain content from across the web, including synthetic content produced by earlier-generation models. Research on this effect — sometimes called "model collapse" or "data contamination" — finds that models trained on increasing proportions of synthetic data degrade in ways that are difficult to detect and that compound across generations.2 The paper's theoretical analysis establishes that this process is not merely probable but mathematically inevitable: even under ideal training conditions with no functional estimation error, statistical sampling ensures that each successive model generation loses information about low-probability events, with the distribution progressively narrowing toward a near-zero-variance point estimate.2 What is emerging in the wild is a dynamic where synthetic content, designed to be retrieved and cited, is progressively incorporated into the training and inference inputs of AI systems that then propagate it further.
Key takeaway: Narrative laundering does not require the disinformation to be deliberately planted into AI training data. It requires only that AI retrieval systems fail to verify provenance — which is their default operating mode.
Key takeaway: Model collapse is not a risk to be managed — it is a mathematical inevitability once synthetic content comprises a significant share of training data, and it preferentially erases the low-probability events that represent marginalised populations and rare phenomena.
3. Personalisation: From Broadcast to Targeted Manipulation
Traditional disinformation was broadcast by nature: the same message distributed to the widest possible audience. Effectiveness was a product of reach and repetition. The model was industrialised but undifferentiated — the message was the same whether delivered to a retired schoolteacher or a first-time voter.
Agentic systems change this in a way that is qualitatively, not merely quantitatively, different. An agent with access to an individual's communication history, social media activity, browsing patterns, or purchasing behaviour can craft messages that are specifically calibrated to that individual's documented beliefs, anxieties, and social connections. The same underlying false claim can be packaged differently for each recipient — invoking the sources they trust, the communities they belong to, the issues they care about — at scale and without the per-message human effort that previously made targeted disinformation operationally intensive.
Micro-targeted disinformation of this kind is substantially more effective than broadcast disinformation along several dimensions. It is harder to detect because there is no single shared message to analyse. It is harder to correct because corrections need to reach specific individuals who have received specific framing. And it is harder to attribute because the content varies enough that correlating it across recipients requires the kind of signals — shared API calls, shared model outputs — that may not be visible to outside observers.
The same capability that makes a personalised marketing agent effective makes a personalised disinformation agent effective. The mechanism is identical; the intent differs. This is not a reason to avoid personalisation in legitimate applications — it is a reason to be precise about what data agents are allowed to access and what they are allowed to do with it.
4. The Detection Problem
Against the production mechanisms described above, the detection landscape is asymmetric in ways that favour the attacker.
AI-generated content classifiers — tools designed to identify whether a piece of text was produced by a language model — are in a continuous arms race with the models they detect. Early classifiers exploited statistical regularities in model outputs: the models tended to produce certain word choices and structures at higher frequencies than humans. As models have improved, those regularities have diminished. Detection accuracy for current frontier model outputs is significantly lower than for outputs from earlier generations, and the trend line continues in the wrong direction.3 Empirical stress-testing makes this concrete: recursive paraphrasing — passing AI-generated text through a paraphrasing model multiple times — drops watermark detection rates from above 99% to single digits in experiments, with 89% of paraphrased outputs rated high quality by human evaluators, demonstrating that evasion is not only a future problem from improving models but an attack available to any adversary today. A theoretical analysis further establishes that as AI text distributions converge toward human text distributions, the maximum achievable detection accuracy is provably bounded and decreases toward chance regardless of the detection method used — making the arms race unwinnable at the limit.
Watermarking — embedding detectable signals in model outputs at the generation stage — is a promising technical approach but faces structural challenges in the agentic context. A watermark embedded by a model provider is only detectable if the content has not been paraphrased, translated, or substantially edited — all of which are trivial operations for an agent. Content that passes through a generation agent and then a paraphrasing agent loses its watermark. Watermarking also provides no protection against content generated by models that have not implemented it, which includes any open-weight model downloaded and run without the provider's watermarking layer.
Provenance and content authentication — standards that attach cryptographic proof of origin to digital content, tracking where it was created, by whom, and whether it has been modified — represent a more durable approach. The Coalition for Content Provenance and Authenticity (C2PA), a cross-industry standard for content credentials, has gained adoption among camera manufacturers, media organisations, and some model providers. A structural limitation of the standard is that credentials can be stripped from content before redistribution, meaning absence of credentials cannot be treated as a signal of inauthenticity — the system proves origin when credentials are present, but cannot prevent their removal.6 But adoption is incomplete, and provenance systems are only as reliable as their enforcement — a provenance-free piece of content cannot be assumed to be synthetic, but it also cannot be assumed to be authentic.
Behavioural detection — identifying coordinated networks through the patterns of how content is distributed, rather than the content itself — remains one of the more reliable detection mechanisms, but requires access to platform-level data that is not generally available to outside researchers or affected organisations.
Key takeaway: No single detection mechanism is reliable against current-generation model outputs deployed through agentic systems. Detection is most effective as a layered approach — provenance, behavioural analysis, and classifier signals combined — rather than as any single tool.
5. The Liar's Dividend
One of the most consequential effects of the disinformation landscape produced by generative AI is not the disinformation itself but what it does to genuine content. The legal scholars Robert Chesney and Danielle Citron identified this dynamic before large-scale deployment of generative models: the mere existence of a credible capability to fabricate convincing content allows bad actors to dismiss authentic content as synthetic.4
A video of a public figure making a damaging statement can be dismissed as a deepfake. An authentic document can be claimed to have been generated by an AI. A genuine eyewitness account can be attributed to a synthetic persona. In each case, the attacker does not need to produce disinformation — they need only to invoke the plausible existence of disinformation to neutralise authentic evidence.
The paper also identifies a structural timing problem: certain decisions — elections, financial transactions, reputational judgements — are made in narrow, irreversible windows, so a well-timed fabrication can determine outcomes before any correction can take effect, regardless of whether the fake is eventually exposed.4
This dividend is already being claimed in real contexts: in legal proceedings, in political disputes, in corporate reputation management. Its long-term effect is a degradation of the epistemic commons — the shared infrastructure of trusted information sources, institutional verification processes, and evidentiary standards through which societies make collective judgements about what is true.
The liar's dividend is not primarily a technical problem. It is an institutional one: the solution is not better deepfake detection but stronger provenance infrastructure, more resilient institutional verification processes, and greater public awareness of how synthetic content works and how to assess it. Technology can support these goals but cannot substitute for them.
Key takeaway: The liar's dividend grows in proportion to public awareness of synthetic media — the more people understand that fakes exist, the more credibly authentic evidence can be dismissed as fabricated.
6. Systemic Effects on Information Infrastructure
The disinformation risks from agentic AI are not only about individual instances of false content. At sufficient scale and persistence, they create systemic effects on the information infrastructure that individuals, organisations, and institutions depend on.
Three systemic effects merit particular attention:
Training corpus contamination. As synthetic content proliferates across the web and is indexed, scraped, and included in training datasets, future model generations are increasingly trained on outputs of prior model generations. The effect — sometimes called model collapse — is a progressive degradation in the diversity and grounding of model outputs, as the statistical properties of model-generated text displace those of human-generated text in training distributions.2 The practical consequence is models that are progressively less connected to empirical reality and more connected to the statistical regularities of prior model outputs. Research further establishes that the low-probability events that collapse first are disproportionately relevant to marginalised groups and rare phenomena, making training corpus contamination not only a quality problem but an equity one.2
Authority substitution. As AI summarisation becomes the primary interface through which many users access information — rather than reading primary sources directly — the agent's choices about what to retrieve, how to weight sources, and how to frame summaries become de facto editorial decisions with significant influence over what information reaches users. An agent that retrieves and amplifies synthetic content is not making a deliberate choice; it is following its retrieval logic. But the effect on the information environment is the same.
Institutional credibility degradation. Trust in media, scientific publication, and institutional communication depends on the assumption that the content associated with trusted institutions comes from the people and processes those institutions represent. Synthetic content that mimics institutional voices — fabricated press releases, fake journal articles attributed to real researchers, manufactured regulatory filings — attacks this assumption. Restoring it after it has been undermined is far harder than maintaining it in the first place.
7. Governance Responses
The regulatory response to AI-generated disinformation is active and developing across multiple jurisdictions, though no framework has yet achieved comprehensive coverage.
The EU AI Act includes specific provisions for synthetic content: systems generating synthetic audio, video, image, or text content are subject to transparency obligations, including disclosure that the content is AI-generated, with carve-outs for clearly artistic or satirical purposes.5 The Act further introduces a layered compliance structure for customer-facing agents built on general-purpose AI models: the foundation model provider carries obligations around technical documentation and training data transparency, while the deployer retains responsibility for ensuring appropriate human oversight and correct risk classification in their specific deployment context.5
Platform-level policies at the major social media companies have evolved to require disclosure of AI-generated content in political advertising and, in some cases, across all synthetic content meeting a materiality threshold. Enforcement is uneven, and the platforms' capacity to detect synthetic content at the volume being produced is limited.
Provenance standards — C2PA and related frameworks — are gaining adoption among camera hardware manufacturers and some media organisations, creating a provenance chain for content that originates from trusted devices. The coverage gap for text content, and for content originating outside the adoption community, remains significant.
Export controls and access restrictions on frontier model APIs have been adopted in some jurisdictions as a way to limit the capability available to actors with known influence operation intent. The effectiveness of these measures is limited by the availability of capable open-weight models that can be downloaded and run without API access.
No single governance measure closes the gap. The most effective responses combine technical standards (provenance, watermarking), regulatory transparency requirements, platform-level enforcement, and institutional media literacy — a combination that requires coordination across private sector, government, and civil society actors that has historically been difficult to achieve quickly enough to keep pace with technology deployment.
8. Organisational Responsibilities
For organisations deploying agentic systems, the disinformation landscape described above creates responsibilities that extend beyond their immediate operational context.
Content provenance. Organisations that use agents to generate content for external publication — marketing material, research summaries, communications — carry a responsibility to disclose the AI origin of that content where material. This is not only a regulatory compliance matter in jurisdictions that require it; it is a contribution to the provenance infrastructure on which the broader information environment depends. Disclosure standards that are applied even when not legally required help establish norms that limit the space for bad actors to hide synthetic content behind the precedent of undisclosed legitimate use.
Retrieval system design. Organisations deploying research or summarisation agents should audit what those agents retrieve and how they assess source credibility. Default retrieval pipelines return what is semantically relevant, not what is verified. Organisations that deploy agents retrieving from the open web without provenance checking are making an implicit choice to treat unverified content as credible — a choice that should be explicit and intentional, not a default.
Synthetic persona prohibition. The use of agents to operate synthetic personas — accounts or identities that present themselves as human in contexts where the audience assumes human authorship — is among the most clearly harmful applications of agentic capability. This applies regardless of the content those personas produce. The deception is in the persona, not only in what the persona says. Organisations should explicitly prohibit this use in their acceptable use policies and design systems that make it difficult to pursue.
Red-teaming for disinformation risk. Organisations that deploy agents capable of content generation should include disinformation scenarios in their red-teaming and evaluation programmes — asking explicitly whether a compromised or misdirected version of the system could be used to produce or amplify false content, and what the blast radius of that failure would be. An agent that can write marketing copy can, if misdirected, write misinformation. The capability is the same; the guardrails are what differ.
The organisations that will navigate the disinformation landscape most responsibly are those that treat the question "could this system be used to mislead?" as a design constraint applied at the beginning, not an audit question asked after deployment.
9. Closing Part 4
The three chapters of Part 4 have moved from the inherent to the adversarial to the systemic. Chapter 12 established that agents fail in ways rooted in their fundamental architecture — hallucinating, drifting, miscalibrating — without anyone's deliberate intent. Chapter 13 showed how those same vulnerabilities become attack surfaces when someone is deliberately trying to exploit them. This chapter has examined what happens when agentic capabilities are deployed at scale in an information environment — not just the risks to individual systems but the risks to the shared infrastructure of trust that individuals and institutions depend on.
The arc is important. The failure modes in Chapter 12 are engineering problems with engineering responses. The attack surfaces in Chapter 13 are security problems with security responses. The systemic risks in this chapter are something different: they are civilisational infrastructure problems that require institutional, regulatory, and normative responses alongside technical ones. The organisations and individuals deploying agentic systems are not merely responsible for their systems' direct outputs; they are participants in shaping the information environment that everyone shares.
Part 5 turns from risks to applications — examining how agentic AI is being deployed across specific business functions, and what the preceding risk analysis implies for how those deployments should be designed and governed.
References
- 1. Goldstein, J.A., Sastry, G., Musser, M., DiResta, R., Gentzel, M., & Sedova, K. (2023). Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. arXiv:2301.04246. Georgetown University Center for Security and Emerging Technology / OpenAI / Stanford Internet Observatory.
- 2. Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., & Anderson, R. (2024). AI Models Collapse When Trained on Recursively Generated Data. Nature, 631, 755–759. https://doi.org/10.1038/s41586-024-07566-y
- 3. Sadasivan, V.S., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2023). Can AI-Generated Text Be Reliably Detected? arXiv:2303.11156. University of Maryland / Harvard University.
- 4. Chesney, R. & Citron, D.K. (2019). Deep Fakes: A Looming Challenge for Privacy, Democracy, and National Security. California Law Review, 107(6), 1753–1820.
- 5. Future of Life Institute (2024). High-level summary of the AI Act. Available at: artificialintelligenceact.eu/high-level-summary (Previously cited as Ch.9, ref. 1.)
- 6. Coalition for Content Provenance and Authenticity (C2PA) (2024). C2PA Technical Specification, Version 2.0. https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html
Building agentic AI and wondering why alignment is harder than the technology? Get in touch