[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

arXiv - AI 3 min read Article

Summary

The paper presents NoRD, a data-efficient Vision-Language-Action model that enhances autonomous driving without requiring extensive datasets or reasoning annotations.

Why It Matters

NoRD addresses critical challenges in developing autonomous driving systems by reducing the need for large datasets and complex reasoning, making it more feasible for real-world applications. This could lead to faster advancements in AI-driven vehicles and lower barriers for research and development in the field.

Key Takeaways

  • NoRD achieves competitive performance with less than 60% of the data typically required.
  • The model operates without reasoning annotations, simplifying the training process.
  • Incorporates Dr. GRPO to mitigate difficulty bias, enhancing learning efficiency.
  • Demonstrates effectiveness on established benchmarks like Waymo and NAVSIM.
  • Potentially accelerates the development of more efficient autonomous systems.

Computer Science > Artificial Intelligence arXiv:2602.21172 (cs) [Submitted on 24 Feb 2026] Title:NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning Authors:Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan View a PDF of the paper titled NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning, by Ishaan Rawal and 3 other authors View PDF HTML (experimental) Abstract:Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive p...

Related Articles

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
Llms

[2603.18940] Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought

Abstract page for arXiv paper 2603.18940: Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty ...

arXiv - Machine Learning · 3 min ·
[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps
Machine Learning

[2512.20620] Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability Maps

Abstract page for arXiv paper 2512.20620: Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness ...

arXiv - Machine Learning · 4 min ·
[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
Machine Learning

[2512.13607] Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Abstract page for arXiv paper 2512.13607: Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXiv - Machine Learning · 4 min ·
[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
Machine Learning

[2512.02650] Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

Abstract page for arXiv paper 2512.02650: Hear What Matters! Text-conditioned Selective Video-to-Audio Generation

arXiv - Machine Learning · 3 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime