[2511.08409] Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs
About this article
Abstract page for arXiv paper 2511.08409: Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs
Computer Science > Artificial Intelligence arXiv:2511.08409 (cs) [Submitted on 11 Nov 2025 (v1), last revised 8 Apr 2026 (this version, v4)] Title:Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs Authors:Junxian Li, Xinyue Xu, Sai Ma, Di Zhang, Sichao Li View a PDF of the paper titled Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs, by Junxian Li and 4 other authors View PDF HTML (experimental) Abstract:Multimodal Large Language Models (MLLMs) frequently suffer from unfaithfulness, generating reasoning chains that drift from visual evidence or contradict final predictions. We propose Faithful-First Reasoning, Planning, and Acting (RPA) framework in which FaithEvi provides step-wise and chain-level supervision by evaluating the faithfulness of intermediate reasoning, and FaithAct uses these signals to plan and execute faithfulness-aware actions during inference. Experiments across multiple multimodal reasoning benchmarks show that faithful-first RPA improves perceptual faithfulness by up to 24% over prompt-based and tool-augmented reasoning frameworks, without degrading task accuracy. Our analysis shows that treating faithfulness as a guiding principle perceptually faithful reasoning trajectories and mitigates hallucination behavior. This work thereby establishes a unified framework for both evaluating and enforcing faithfulness in multimodal reasoning. Code is at this https URL. Comments: Subjects: Artificial Intelligence (cs.AI) Cite a...