[2603.03884] CzechTopic: A Benchmark for Zero-Shot Topic Localization

[2603.03884] CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

arXiv - AI March 05, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.03884: CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

Computer Science > Computation and Language arXiv:2603.03884 (cs) [Submitted on 4 Mar 2026] Title:CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents Authors:Martin Kostelník, Michal Hradiš, Martin Dočekal View a PDF of the paper titled CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents, by Martin Kosteln\'ik and 2 other authors View PDF HTML (experimental) Abstract:Topic localization aims to identify spans of text that express a given topic defined by a name and description. To study this task, we introduce a human-annotated benchmark based on Czech historical documents, containing human-defined topics together with manually annotated spans and supporting evaluation at both document and word levels. Evaluation is performed relative to human agreement rather than a single reference annotation. We evaluate a diverse range of large language models alongside BERT-based models fine-tuned on a distilled development dataset. Results reveal substantial variability among LLMs, with performance ranging from near-human topic detection to pronounced failures in span localization. While the strongest models approach human agreement, the distilled token embedding models remain competitive despite their smaller scale. The dataset and evaluation framework are publicly available at: this https URL. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.03884 [cs.CL] (or arXiv:...

Originally published on March 05, 2026. Curated by AI News.

Ai Startups

Inside the stealthy startup that pitched brainless human clones | MIT Technology Review

Need a backup body? We uncovered a radical proposal for “full body replacement.”

MIT Technology Review · 25 min · about 2 hours ago

Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min · about 2 hours ago

Machine Learning

Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion

Chamco Digital, a recognized Microsoft AI and Cloud Technology Partner, announced the launch of its globally accessible Microsoft AI and ...

AI News - General · 4 min · about 2 hours ago

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · about 5 hours ago

[2603.03884] CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

About this article

Related Articles

Inside the stealthy startup that pitched brainless human clones | MIT Technology Review

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

No comments

Stay updated with AI News