[2604.03393] TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering
About this article
Abstract page for arXiv paper 2604.03393: TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering
Computer Science > Artificial Intelligence arXiv:2604.03393 (cs) [Submitted on 3 Apr 2026] Title:TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering Authors:Tung Sum Thomas Kwok, Xinyu Wang, Xiaofeng Lin, Peng Lu, Chunhe Wang, Changlun Li, Hanwei Wu, Nan Tang, Elisa Kreiss, Guang Cheng View a PDF of the paper titled TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering, by Tung Sum Thomas Kwok and 9 other authors View PDF Abstract:Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. While multi-turn table reasoning methods have improved reasoning accuracy through tool use and reward modeling, they rely on fixed text serialization for table state readouts. This introduces representation errors in table encoding that significantly accumulate over multiple turns. Such accumulation is alleviated by tabular grounding methods in the expense of inference compute and cost, rendering real world deployment impractical. To address this, we introduce TABQAWORLD, a table reasoning framework that jointly optimizes tabular action through representation and estimation. For representation, TABQAWORLD employs an action-conditioned multimodal selection policy, which dynamically switches between visual and textual representations to maximize table state readout reliability. For estimation, TABQAWORLD optimizes stepwise reasoning trajectory through table metadata includin...