[2603.17205] OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

[2603.17205] OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.17205: OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

Computer Science > Information Retrieval arXiv:2603.17205 (cs) [Submitted on 17 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)] Title:OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation Authors:Haoyang Fang, Shuai Zhang, Yifei Ma, Hengyi Wang, Cuixiong Hu, Katrin Kirchhoff, Bernie Wang, George Karypis View a PDF of the paper titled OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation, by Haoyang Fang and 7 other authors View PDF HTML (experimental) Abstract:Domain-specific finetuning is essential for dense retrievers, yet not all training pairs contribute equally to the learning process. We introduce OPERA, a data pruning framework that exploits this heterogeneity to improve both the effectiveness and efficiency of retrieval model adaptation. We first investigate static pruning (SP), which retains only high-similarity query-document pairs, revealing an intrinsic quality-coverage tradeoff: ranking (NDCG) improves while retrieval (Recall) can degrade due to reduced query diversity. To resolve this tradeoff, we propose a two-stage dynamic pruning (DP) strategy that adaptively modulates sampling probabilities at both query and document levels throughout training, prioritizing high-quality examples while maintaining access to the full training set. Evaluations across eight datasets spanning six domains demonstrate the effectiveness of both approaches: SP improves ranking over standard finetuning (NDCG@10 +0.5\%), while DP achieves the s...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

[2512.02966] Lumos: Let there be Language Model System Certification
Llms

[2512.02966] Lumos: Let there be Language Model System Certification

Abstract page for arXiv paper 2512.02966: Lumos: Let there be Language Model System Certification

arXiv - AI · 4 min ·
[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections
Llms

[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections

Abstract page for arXiv paper 2602.00750: Bypassing Prompt Injection Detectors through Evasive Injections

arXiv - AI · 4 min ·
[2510.24906] Fair Indivisible Payoffs through Shapley Value
Machine Learning

[2510.24906] Fair Indivisible Payoffs through Shapley Value

Abstract page for arXiv paper 2510.24906: Fair Indivisible Payoffs through Shapley Value

arXiv - AI · 3 min ·
[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback
Llms

[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

Abstract page for arXiv paper 2511.08225: Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime