[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
Summary
The paper presents NoRD, a data-efficient Vision-Language-Action model that enhances autonomous driving without requiring extensive datasets or reasoning annotations.
Why It Matters
NoRD addresses critical challenges in developing autonomous driving systems by reducing the need for large datasets and complex reasoning, making it more feasible for real-world applications. This could lead to faster advancements in AI-driven vehicles and lower barriers for research and development in the field.
Key Takeaways
- NoRD achieves competitive performance with less than 60% of the data typically required.
- The model operates without reasoning annotations, simplifying the training process.
- Incorporates Dr. GRPO to mitigate difficulty bias, enhancing learning efficiency.
- Demonstrates effectiveness on established benchmarks like Waymo and NAVSIM.
- Potentially accelerates the development of more efficient autonomous systems.
Computer Science > Artificial Intelligence arXiv:2602.21172 (cs) [Submitted on 24 Feb 2026] Title:NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning Authors:Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan View a PDF of the paper titled NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning, by Ishaan Rawal and 3 other authors View PDF HTML (experimental) Abstract:Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive p...