[2603.19253] A comprehensive study of LLM-based argument classification: from Llama through DeepSeek to GPT-5.2
About this article
Abstract page for arXiv paper 2603.19253: A comprehensive study of LLM-based argument classification: from Llama through DeepSeek to GPT-5.2
Computer Science > Computation and Language arXiv:2603.19253 (cs) [Submitted on 25 Feb 2026] Title:A comprehensive study of LLM-based argument classification: from Llama through DeepSeek to GPT-5.2 Authors:Marcin Pietroń, Filip Gampel, Jakub Gomułka, Andrzej Tomski, Rafał Olszowski View a PDF of the paper titled A comprehensive study of LLM-based argument classification: from Llama through DeepSeek to GPT-5.2, by Marcin Pietro\'n and 4 other authors View PDF Abstract:Argument mining (AM) is an interdisciplinary research field focused on the automatic identification and classification of argumentative components, such as claims and premises, and the relationships between them. Recent advances in large language models (LLMs) have significantly improved the performance of argument classification compared to traditional machine learning approaches. This study presents a comprehensive evaluation of several state-of-the-art LLMs, including GPT-5.2, Llama 4, and DeepSeek, on large publicly available argument classification corpora such as this http URL and UKP. The evaluation incorporates advanced prompting strategies, including Chain-of- Thought prompting, prompt rephrasing, voting, and certainty-based classification. Both quantitative performance metrics and qualitative error analysis are conducted to assess model behavior. The best-performing model in the study (GPT-5.2) achieves a classification accuracy of 78.0% (UKP) and 91.9% (this http URL). The use of prompt rephrasing, ...