[2602.15758] ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models

[2602.15758] ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models

arXiv - AI 3 min read Article

Summary

The paper presents ChartEditBench, a benchmark for evaluating multi-turn chart editing in multimodal language models, highlighting challenges in maintaining context and accuracy during iterative visualizations.

Why It Matters

As multimodal language models become integral to data analysis, understanding their limitations in real-world applications is crucial. ChartEditBench addresses the gap in evaluating sustained interactions, providing a framework to improve user experience in data visualization tasks.

Key Takeaways

  • ChartEditBench introduces a benchmark for multi-turn chart editing.
  • The framework evaluates context-aware editing, unlike previous one-shot benchmarks.
  • Experiments reveal significant performance degradation in multi-turn settings.
  • Error accumulation and context breakdowns are major challenges for MLLMs.
  • The benchmark aims to enhance grounded, intent-aware multimodal programming.

Computer Science > Computation and Language arXiv:2602.15758 (cs) [Submitted on 17 Feb 2026] Title:ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models Authors:Manav Nitin Kapadnis, Lawanya Baghel, Atharva Naik, Carolyn Rosé View a PDF of the paper titled ChartEditBench: Evaluating Grounded Multi-Turn Chart Editing in Multimodal Language Models, by Manav Nitin Kapadnis and 3 other authors View PDF HTML (experimental) Abstract:While Multimodal Large Language Models (MLLMs) perform strongly on single-turn chart generation, their ability to support real-world exploratory data analysis remains underexplored. In practice, users iteratively refine visualizations through multi-turn interactions that require maintaining common ground, tracking prior edits, and adapting to evolving preferences. We introduce ChartEditBench, a benchmark for incremental, visually grounded chart editing via code, comprising 5,000 difficulty-controlled modification chains and a rigorously human-verified subset. Unlike prior one-shot benchmarks, ChartEditBench evaluates sustained, context-aware editing. We further propose a robust evaluation framework that mitigates limitations of LLM-as-a-Judge metrics by integrating execution-based fidelity checks, pixel-level visual similarity, and logical code verification. Experiments with state-of-the-art MLLMs reveal substantial degradation in multi-turn settings due to error accumulation and breakdowns in shared context, with...

Related Articles

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min ·
I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge
Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min ·
Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min ·
Block Resets Management With AI As Cash App Adds Installment Transfers
Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime