[2408.11871] MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
About this article
Abstract page for arXiv paper 2408.11871: MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
Computer Science > Computation and Language arXiv:2408.11871 (cs) [Submitted on 19 Aug 2024 (v1), last revised 4 Apr 2026 (this version, v3)] Title:MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models Authors:Lionel Z. Wang, Ka Chung Ng, Yiming Ma, Wenqi Fan View a PDF of the paper titled MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models, by Lionel Z. Wang and 3 other authors View PDF HTML (experimental) Abstract:Fake news significantly influences decision-making processes by misleading individuals, organizations, and even governments. Large language models (LLMs), as part of generative AI, can amplify this problem by generating highly convincing fake news at scale, posing a significant threat to online information integrity. Therefore, understanding the motivations and mechanisms behind fake news generated by LLMs is crucial for effective detection and governance. In this study, we develop the LLM-Fake Theory, a theoretical framework that integrates various social psychology theories to explain machine-generated deception. Guided by this framework, we design an innovative prompt engineering pipeline that automates fake news generation using LLMs, eliminating manual annotation needs. Utilizing this pipeline, we create a theoretically informed \underline{M}achin\underline{e}-\underline{g}ener\underline{a}ted \underline{Fake} news dataset, MegaFake, derived from FakeNewsNet. Through extensive experiments with MegaF...