[2509.25249] BEV-VLM: Trajectory Planning via Unified BEV Abstraction
About this article
Abstract page for arXiv paper 2509.25249: BEV-VLM: Trajectory Planning via Unified BEV Abstraction
Computer Science > Robotics arXiv:2509.25249 (cs) [Submitted on 27 Sep 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:BEV-VLM: Trajectory Planning via Unified BEV Abstraction Authors:Guancheng Chen, Sheng Yang, Tong Zhan, Jian Wang View a PDF of the paper titled BEV-VLM: Trajectory Planning via Unified BEV Abstraction, by Guancheng Chen and 3 other authors View PDF HTML (experimental) Abstract:This paper introduces BEV-VLM, a novel approach for trajectory planning in autonomous driving that leverages Vision-Language Models (VLMs) with Bird's-Eye View (BEV) feature maps as visual input. Unlike conventional trajectory planning approaches that rely solely on raw visual data (e.g., camera images), our method utilizes a highly compressed and informative BEV representation generated by fusing camera and LiDAR data, with subsequent alignment to High-Definition (HD) maps. This unified BEV-HD map format provides a geometrically consistent and semantically rich scene description, which enables VLMs to perform accurate and robust trajectory planning. Experimental results on the nuScenes dataset demonstrate that, compared with state-of-the-art vision-only methods, our approach achieves a 53.1% improvement in planning accuracy and realizes complete collision avoidance in evaluation scenarios. Our work highlights that VLMs can effectively interpret processed visual representations such as BEV features, expanding their applicability beyond raw image inputs for the task of ...