[2602.14488] BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR
Summary
This article presents the BETA-labeling framework for constructing a Bangla IR dataset, addressing challenges in low-resource languages and the reliability of LLMs for dataset annotation.
Why It Matters
The study highlights the critical need for high-quality annotated datasets in low-resource languages, which are often overlooked. By exploring the potential and limitations of LLMs in this context, it provides valuable insights for researchers and practitioners aiming to improve multilingual information retrieval systems.
Key Takeaways
- BETA-labeling framework enhances dataset quality through multiple LLM annotators.
- Human evaluation is crucial for ensuring label reliability in low-resource settings.
- Cross-lingual dataset reuse poses risks due to language-dependent biases.
Computer Science > Computation and Language arXiv:2602.14488 (cs) [Submitted on 16 Feb 2026] Title:BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR Authors:Md. Najib Hasan, Mst. Jannatun Ferdous Rain, Fyad Mohammed, Nazmul Siddique View a PDF of the paper titled BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR, by Md. Najib Hasan and 3 other authors View PDF HTML (experimental) Abstract:IR in low-resource languages remains limited by the scarcity of high-quality, task-specific annotated datasets. Manual annotation is expensive and difficult to scale, while using large language models (LLMs) as automated annotators introduces concerns about label reliability, bias, and evaluation validity. This work presents a Bangla IR dataset constructed using a BETA-labeling framework involving multiple LLM annotators from diverse model families. The framework incorporates contextual alignment, consistency checks, and majority agreement, followed by human evaluation to verify label quality. Beyond dataset creation, we examine whether IR datasets from other low-resource languages can be effectively reused through one-hop machine translation. Using LLM-based translation across multiple language pairs, we experimented on meaning preservation and task validity between source and translated datasets. Our experiment reveal substantial variation across languages, reflecting language-dependent biases and inconsistent semantic preservation that directly...