[2603.04636] When Agents Persuade: Propaganda Generation and Mitigation in LLMs
About this article
Abstract page for arXiv paper 2603.04636: When Agents Persuade: Propaganda Generation and Mitigation in LLMs
Computer Science > Artificial Intelligence arXiv:2603.04636 (cs) [Submitted on 4 Mar 2026] Title:When Agents Persuade: Propaganda Generation and Mitigation in LLMs Authors:Julia Jose, Ritik Roongta, Rachel Greenstadt View a PDF of the paper titled When Agents Persuade: Propaganda Generation and Mitigation in LLMs, by Julia Jose and 1 other authors View PDF HTML (experimental) Abstract:Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective. Comments: Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.04636 [cs.AI] (or arXiv:2603.04636v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.04636 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission histo...