[2602.18453] LLM-Assisted Replication for Quantitative Social Science
Summary
The paper presents an LLM-based system designed to replicate statistical analyses in quantitative social science, addressing the replication crisis by enhancing research verification processes.
Why It Matters
The replication crisis undermines the credibility of empirical research. This study explores how large language models can streamline the replication process, potentially improving research integrity and fostering trust in scientific findings.
Key Takeaways
- LLMs can automate the replication of statistical analyses in social science research.
- The proposed system identifies discrepancies in results, enhancing verification efforts.
- Quantitative social science's reliance on standard models makes it ideal for LLM applications.
- The tool can support pre-submission checks and peer-review processes.
- AI verification may serve as a crucial infrastructure for improving research integrity.
Computer Science > Computers and Society arXiv:2602.18453 (cs) [Submitted on 4 Feb 2026] Title:LLM-Assisted Replication for Quantitative Social Science Authors:So Kubota, Hiromu Yakura, Samuel Coavoux, Sho Yamada, Yuki Nakamura View a PDF of the paper titled LLM-Assisted Replication for Quantitative Social Science, by So Kubota and 4 other authors View PDF HTML (experimental) Abstract:The replication crisis, the failure of scientific claims to be validated by further research, is one of the most pressing issues for empirical research. This is partly an incentive problem: replication is costly and less well rewarded than original research. Large language models (LLMs) have accelerated scientific production by streamlining writing, coding, and reviewing, yet this acceleration risks outpacing verification. To address this, we present an LLM-based system that replicates statistical analyses from social science papers and flags potential problems. Quantitative social science is particularly well-suited to automation because it relies on standard statistical models, shared public datasets, and uniform reporting formats such as regression tables and summary statistics. We present a prototype that iterates LLM-based text interpretation, code generation, execution, and discrepancy analysis, demonstrating its capabilities by reproducing key results from a seminal sociology paper. We also outline application scenarios including pre-submission checks, peer-review support, and meta-sci...