[2602.20676] PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization
Summary
The paper presents PRECTR-V2, an advanced framework for improving search relevance and click-through rate (CTR) prediction by addressing challenges in user preference modeling and exposure bias.
Why It Matters
This research is significant as it tackles the dual challenges of enhancing user experience through personalized search results and optimizing revenue generation for platforms. By refining how user preferences are modeled and addressing biases in data, it contributes to more effective information retrieval systems.
Key Takeaways
- PRECTR-V2 integrates relevance matching and CTR prediction to improve user engagement.
- It addresses challenges faced by low-active and new users through global preference mining.
- The framework corrects exposure bias using hard negative sampling and pairwise loss optimization.
- A lightweight transformer-based encoder enhances adaptability for CTR fine-tuning.
- The research advances beyond traditional models by employing knowledge distillation techniques.
Computer Science > Information Retrieval arXiv:2602.20676 (cs) [Submitted on 24 Feb 2026] Title:PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization Authors:Shuzhi Cao, Rong Chen, Ailong He, Shuguang Han, Jufeng Chen View a PDF of the paper titled PRECTR-V2:Unified Relevance-CTR Framework with Cross-User Preference Mining, Exposure Bias Correction, and LLM-Distilled Encoder Optimization, by Shuzhi Cao and 4 other authors View PDF HTML (experimental) Abstract:In search systems, effectively coordinating the two core objectives of search relevance matching and click-through rate (CTR) prediction is crucial for discovering users' interests and enhancing platform revenue. In our prior work PRECTR, we proposed a unified framework to integrate these two subtasks,thereby eliminating their inconsistency and leading to mutual this http URL, our previous work still faces three main challenges. First, low-active users and new users have limited search behavioral data, making it difficult to achieve effective personalized relevance preference modeling. Second, training data for ranking models predominantly come from high-relevance exposures, creating a distribution mismatch with the broader candidate space in coarse-ranking, leading to generalization bias. Third, due to the latency constraint, the original model employs an Emb+MLP architecture with a frozen BERT encoder, which prevents joint optimi...