[2603.22288] Evaluating Prompting Strategies for Chart Question Answering with Large Language Models
About this article
Abstract page for arXiv paper 2603.22288: Evaluating Prompting Strategies for Chart Question Answering with Large Language Models
Computer Science > Computation and Language arXiv:2603.22288 (cs) [Submitted on 3 Mar 2026] Title:Evaluating Prompting Strategies for Chart Question Answering with Large Language Models Authors:Ruthuparna Naikar, Ying Zhu View a PDF of the paper titled Evaluating Prompting Strategies for Chart Question Answering with Large Language Models, by Ruthuparna Naikar and 1 other authors View PDF HTML (experimental) Abstract:Prompting strategies affect LLM reasoning performance, but their role in chart-based QA remains underexplored. We present a systematic evaluation of four widely used prompting paradigms (Zero-Shot, Few-Shot, Zero-Shot Chain-of-Thought, and Few-Shot Chain-of-Thought) across GPT-3.5, GPT-4, and GPT-4o on the ChartQA dataset. Our framework operates exclusively on structured chart data, isolating prompt structure as the only experimental variable, and evaluates performance using two metrics: Accuracy and Exact Match. Results from 1,200 diverse ChartQA samples show that Few-Shot Chain-of-Thought prompting consistently yields the highest accuracy (up to 78.2\%), particularly on reasoning-intensive questions, while Few-Shot prompting improves format adherence. Zero-Shot performs well only with high-capacity models on simpler tasks. These findings provide actionable guidance for selecting prompting strategies in structured data reasoning tasks, with implications for both efficiency and accuracy in real-world applications. Subjects: Computation and Language (cs.CL); Ar...