[2603.19880] What If Consensus Lies? Selective-Complementary

[2603.19880] What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

arXiv - AI March 23, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.19880: What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

Computer Science > Machine Learning arXiv:2603.19880 (cs) [Submitted on 20 Mar 2026] Title:What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time Authors:Dong Yan, Jian Liang, Yanbo Wang, Shuo Lu, Ran He, Tieniu Tan View a PDF of the paper titled What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time, by Dong Yan and 5 other authors View PDF HTML (experimental) Abstract:Test-Time Reinforcement Learning (TTRL) enables Large Language Models (LLMs) to enhance reasoning capabilities on unlabeled test streams by deriving pseudo-rewards from majority voting consensus. However, existing TTRL methods rely exclusively on positive pseudo-labeling strategies. Such reliance becomes vulnerable under challenging scenarios where answer distributions are highly dispersed, resulting in weak consensus that inadvertently reinforces incorrect trajectories as supervision signals. In this paper, we propose SCRL (Selective-Complementary Reinforcement Learning), a robust test-time reinforcement learning framework that effectively mitigates label noise amplification. SCRL develops Selective Positive Pseudo-Labeling, which enforces strict consensus criteria to filter unreliable majorities. Complementarily, SCRL introduces Entropy-Gated Negative Pseudo-Labeling, the first negative supervision mechanism in TTRL, to reliably prune incorrect trajectories based on generation uncertainty. Extensive experiments on multiple reasoning benchmarks demo...

Originally published on March 23, 2026. Curated by AI News.

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min · about 5 hours ago

Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min · about 12 hours ago

Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min · about 15 hours ago

[2603.19880] What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time

About this article

Related Articles

OpenClaw security checklist: practical safeguards for AI agents

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

Block Resets Management With AI As Cash App Adds Installment Transfers

No comments

Stay updated with AI News