[2511.07989] State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
Summary
This article evaluates the performance of language models in text classification tasks for South Slavic languages, comparing fine-tuned BERT-like models with LLMs in various domains.
Why It Matters
Understanding the effectiveness of different language models on less-resourced languages is crucial for advancing NLP applications in these regions. This research highlights the trade-offs between fine-tuning and prompting methods, providing insights that can guide future model development and deployment.
Key Takeaways
- LLMs show strong zero-shot performance in text classification for South Slavic languages.
- Fine-tuned BERT-like models remain practical for large-scale text annotation despite LLM advantages.
- LLMs have drawbacks, including unpredictable outputs and higher computational costs.
Computer Science > Computation and Language arXiv:2511.07989 (cs) [Submitted on 11 Nov 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting? Authors:Taja Kuzman Pungeršek, Peter Rupnik, Ivan Porupski, Vuk Dinić, Nikola Ljubešić View a PDF of the paper titled State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?, by Taja Kuzman Punger\v{s}ek and 4 other authors View PDF HTML (experimental) Abstract:Until recently, fine-tuned BERT-like models provided state-of-the-art performance on text classification tasks. With the rise of instruction-tuned decoder-only models, commonly known as large language models (LLMs), the field has increasingly moved toward zero-shot and few-shot prompting. However, the performance of LLMs on text classification, particularly on less-resourced languages, remains under-explored. In this paper, we evaluate the performance of current language models on text classification tasks across several South Slavic languages. We compare openly available fine-tuned BERT-like models with a selection of open-source and closed-source LLMs across three tasks in three domains: sentiment classification in parliamentary speeches, topic classification in news articles and parliamentary speeches, and genre identification in web texts. Our results show that LLMs demonstrate strong zero-shot performance, often matching or surpassing ...