[2603.17205] OPERA: Online Data Pruning for Efficient Retrieval Model

[2603.17205] OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

arXiv - AI April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.17205: OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

Computer Science > Information Retrieval arXiv:2603.17205 (cs) [Submitted on 17 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)] Title:OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation Authors:Haoyang Fang, Shuai Zhang, Yifei Ma, Hengyi Wang, Cuixiong Hu, Katrin Kirchhoff, Bernie Wang, George Karypis View a PDF of the paper titled OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation, by Haoyang Fang and 7 other authors View PDF HTML (experimental) Abstract:Domain-specific finetuning is essential for dense retrievers, yet not all training pairs contribute equally to the learning process. We introduce OPERA, a data pruning framework that exploits this heterogeneity to improve both the effectiveness and efficiency of retrieval model adaptation. We first investigate static pruning (SP), which retains only high-similarity query-document pairs, revealing an intrinsic quality-coverage tradeoff: ranking (NDCG) improves while retrieval (Recall) can degrade due to reduced query diversity. To resolve this tradeoff, we propose a two-stage dynamic pruning (DP) strategy that adaptively modulates sampling probabilities at both query and document levels throughout training, prioritizing high-quality examples while maintaining access to the full training set. Evaluations across eight datasets spanning six domains demonstrate the effectiveness of both approaches: SP improves ranking over standard finetuning (NDCG@10 +0.5\%), while DP achieves the s...

Originally published on April 02, 2026. Curated by AI News.

Llms

[2512.02966] Lumos: Let there be Language Model System Certification

Abstract page for arXiv paper 2512.02966: Lumos: Let there be Language Model System Certification

arXiv - AI · 4 min · about 1 hour ago

Llms

[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections

Abstract page for arXiv paper 2602.00750: Bypassing Prompt Injection Detectors through Evasive Injections

arXiv - AI · 4 min · about 1 hour ago

Machine Learning

[2510.24906] Fair Indivisible Payoffs through Shapley Value

Abstract page for arXiv paper 2510.24906: Fair Indivisible Payoffs through Shapley Value

arXiv - AI · 3 min · about 1 hour ago

Llms

[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

Abstract page for arXiv paper 2511.08225: Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

arXiv - AI · 4 min · about 1 hour ago

[2603.17205] OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

About this article

Related Articles

[2512.02966] Lumos: Let there be Language Model System Certification

[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections

[2510.24906] Fair Indivisible Payoffs through Shapley Value

[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

No comments

Stay updated with AI News