[2602.19313] TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

[2602.19313] TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces TOPReward, a novel method leveraging token probabilities from Vision-Language Models to enhance reinforcement learning in robotics, achieving significant improvements in task progress estimation.

Why It Matters

TOPReward addresses the challenges of low sample efficiency and sparse rewards in reinforcement learning for robotics. By utilizing pretrained Vision-Language Models, it offers a more effective way to estimate task progress, which is crucial for advancing robotic capabilities in real-world applications.

Key Takeaways

  • TOPReward improves task progress estimation in robotics using token probabilities.
  • Achieves 0.947 mean Value-Order Correlation, outperforming existing methods.
  • Demonstrates versatility for applications like success detection and behavior cloning.

Computer Science > Robotics arXiv:2602.19313 (cs) [Submitted on 22 Feb 2026] Title:TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics Authors:Shirui Chen, Cole Harrison, Ying-Chun Lee, Angela Jin Yang, Zhongzheng Ren, Lillian J. Ratliff, Jiafei Duan, Dieter Fox, Ranjay Krishna View a PDF of the paper titled TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics, by Shirui Chen and 8 other authors View PDF HTML (experimental) Abstract:While Vision-Language-Action (VLA) models have seen rapid progress in pretraining, their advancement in Reinforcement Learning (RL) remains hampered by low sample efficiency and sparse rewards in real-world settings. Developing generalizable process reward models is essential for providing the fine-grained feedback necessary to bridge this gap, yet existing temporal value functions often fail to generalize beyond their training domains. We introduce TOPReward, a novel, probabilistically grounded temporal value function that leverages the latent world knowledge of pretrained video Vision-Language Models (VLMs) to estimate robotic task progress. Unlike prior methods that prompt VLMs to directly output progress values, which are prone to numerical misrepresentation, TOPReward extracts task progress directly from the VLM's internal token logits. In zero-shot evaluations across 130+ distinct real-world tasks and multiple robot platforms (e.g., Franka, YAM, SO-100/101), TOPReward achieves 0.947 mean Value-Order ...

Related Articles

Machine Learning

[R] Are there ML approaches for prioritizing and routing “important” signals across complex systems?

I’ve been reading more about attention mechanisms in transformers and how they effectively learn to weight and prioritize relevant inputs...

Reddit - Machine Learning · 1 min ·
Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] Structure Over Scale: Memory-First Reasoning and Depth-Pruned Efficiency in Magnus and Seed Architecture Auto-Discovery

Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...

Reddit - Machine Learning · 1 min ·
UM Computer Scientists Land Grant to Improve Models of Melting Greenland Glaciers
Machine Learning

UM Computer Scientists Land Grant to Improve Models of Melting Greenland Glaciers

Two UM researchers are using advanced neural networks, machine learning and artificial intelligence to improve climate models to better p...

AI News - General · 5 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime