[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
Summary
The article presents CARE Drive, a framework for evaluating the reason-responsiveness of vision language models in automated driving, addressing the gap in current evaluation methods that focus solely on performance outcomes.
Why It Matters
As automated driving technology advances, ensuring that AI models make decisions based on human-relevant considerations is crucial for safety. CARE Drive provides a systematic approach to evaluate how well these models align with human reasoning, which is vital for building trust in AI systems in critical applications.
Key Takeaways
- CARE Drive is a model-agnostic framework for evaluating AI decision-making in driving.
- The framework assesses how human reasons influence model decisions, improving alignment with expert behavior.
- Results indicate varying sensitivity of models to different contextual factors, highlighting the need for nuanced evaluations.
Computer Science > Artificial Intelligence arXiv:2602.15645 (cs) [Submitted on 17 Feb 2026] Title:CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving Authors:Lucas Elbert Suryana, Farah Bierenga, Sanne van Buuren, Pepijn Kooij, Elsefien Tulleners, Federico Scari, Simeon Calvert, Bart van Arem, Arkady Zgonnikov View a PDF of the paper titled CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving, by Lucas Elbert Suryana and 8 other authors View PDF HTML (experimental) Abstract:Foundation models, including vision language models, are increasingly used in automated driving to interpret scenes, recommend actions, and generate natural language explanations. However, existing evaluation methods primarily assess outcome based performance, such as safety and trajectory accuracy, without determining whether model decisions reflect human relevant considerations. As a result, it remains unclear whether explanations produced by such models correspond to genuine reason responsive decision making or merely post hoc rationalizations. This limitation is especially significant in safety critical domains because it can create false confidence. To address this gap, we propose CARE Drive, Context Aware Reasons Evaluation for Driving, a model agnostic framework for evaluating reason responsiveness in vision language models applied to automated driving. CARE Drive compares baseline and rea...