[2510.07972] SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance
About this article
Abstract page for arXiv paper 2510.07972: SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance
Computer Science > Artificial Intelligence arXiv:2510.07972 (cs) [Submitted on 9 Oct 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance Authors:Pengkun Jiao, Yiming Jin, Jianhui Yang, Chenhe Dong, Zerui Huang, Shaowei Yao, Xiaojiang Zhou, Dan Ou, Haihong Tang View a PDF of the paper titled SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance, by Pengkun Jiao and 8 other authors View PDF HTML (experimental) Abstract:Query-product relevance prediction is a foundational technology in e-commerce search engines and has become increasingly important in AI-driven e-commerce. The recent emergence of LLMs, particularly their CoT reasoning capabilities, offers promising opportunities for developing relevance systems that are both more interpretable and more robust. However, existing training paradigms have notable limitations: SFT and DPO suffer from poor generalization on long-tail queries and from a lack of fine-grained, stepwise supervision to enforce rule-aligned reasoning. In contrast, reinforcement learning with verification rewards (RLVR) suffers from sparse feedback, which provides insufficient signal to correct erroneous intermediate steps, thereby undermining logical consistency and limiting performance in complex inference scenarios. To address these challenges, we introduce the Stepwise Hybrid Examination Reinforcement L...