[2603.03884] CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

[2603.03884] CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.03884: CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

Computer Science > Computation and Language arXiv:2603.03884 (cs) [Submitted on 4 Mar 2026] Title:CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents Authors:Martin Kostelník, Michal Hradiš, Martin Dočekal View a PDF of the paper titled CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents, by Martin Kosteln\'ik and 2 other authors View PDF HTML (experimental) Abstract:Topic localization aims to identify spans of text that express a given topic defined by a name and description. To study this task, we introduce a human-annotated benchmark based on Czech historical documents, containing human-defined topics together with manually annotated spans and supporting evaluation at both document and word levels. Evaluation is performed relative to human agreement rather than a single reference annotation. We evaluate a diverse range of large language models alongside BERT-based models fine-tuned on a distilled development dataset. Results reveal substantial variability among LLMs, with performance ranging from near-human topic detection to pronounced failures in span localization. While the strongest models approach human agreement, the distilled token embedding models remain competitive despite their smaller scale. The dataset and evaluation framework are publicly available at: this https URL. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.03884 [cs.CL]   (or arXiv:...

Originally published on March 05, 2026. Curated by AI News.

Related Articles

Inside the stealthy startup that pitched brainless human clones | MIT Technology Review
Ai Startups

Inside the stealthy startup that pitched brainless human clones | MIT Technology Review

Need a backup body? We uncovered a radical proposal for “full body replacement.”

MIT Technology Review · 25 min ·
IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat
Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min ·
Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion
Machine Learning

Chamco Digital Launches Microsoft AI and Cloud Technology Training Program with Board-Endorsed Strategic Expansion

Chamco Digital, a recognized Microsoft AI and Cloud Technology Partner, announced the launch of its globally accessible Microsoft AI and ...

AI News - General · 4 min ·
[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?
Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min ·
More in Ai Startups: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime