[2602.13318] DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

[2602.13318] DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

arXiv - Machine Learning 4 min read Article

Summary

DECKBench introduces a new evaluation framework for multi-agent systems focused on generating and editing academic slide decks, addressing gaps in existing benchmarks.

Why It Matters

This research is significant as it provides a standardized method for evaluating multi-agent frameworks in academic settings, enhancing the quality and effectiveness of automated slide generation and editing. It aims to improve the fidelity and coherence of academic presentations, which is crucial for effective communication in research.

Key Takeaways

  • DECKBench offers a comprehensive evaluation framework for academic slide generation and editing.
  • The framework assesses fidelity, coherence, layout quality, and instruction following.
  • It includes a modular multi-agent baseline system to streamline the slide creation process.
  • The benchmark highlights both strengths and weaknesses in existing systems.
  • Publicly available code and data promote reproducibility and further research.

Computer Science > Artificial Intelligence arXiv:2602.13318 (cs) [Submitted on 10 Feb 2026] Title:DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing Authors:Daesik Jang, Morgan Lindsay Heisler, Linzi Xing, Yifei Li, Edward Wang, Ying Xiong, Yong Zhang, Zhenan Fan View a PDF of the paper titled DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing, by Daesik Jang and 7 other authors View PDF HTML (experimental) Abstract:Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do not adequately measure these challenges. To address this gap, we introduce the Deck Edits and Compliance Kit Benchmark (DECKBench), an evaluation framework for multi-agent slide generation and editing. DECKBench is built on a curated dataset of paper to slide pairs augmented with realistic, simulated editing instructions. Our evaluation protocol systematically assesses slide-level and deck-level fidelity, coherence, layout quality, and multi-turn instruction following. We further implement a modular multi-agent baseline system that decomposes the slide generation and editing task into paper parsing and summarization, slide planning, HTML creation, and iterative editing. Experimental re...

Related Articles

Open Source Ai

we just hit 555 stars on our open source AI agent config tool and i'm honestly still in shock

so a while back me and a few folks started working on Caliber, an open source tool for managing AI agent configs and syncing them with yo...

Reddit - Artificial Intelligence · 1 min ·
Robotics

[P] Cadenza: Connect Wandb logs to agents easily for autonomous research.

Wandb CLI and MCP is atrocious to use with agents for full autonomous research loops. They are slow, clunky, and result in context rot. S...

Reddit - Artificial Intelligence · 1 min ·
Robotics

[P] Cadenza: Connect Wandb logs to agents easily for autonomous research.

Wandb CLI and MCP is atrocious to use with agents for full autonomous research loops. They are slow, clunky, and result in context rot. S...

Reddit - Machine Learning · 1 min ·
Llms

Nvidia goes all-in on AI agents while Anthropic pulls the plug

TLDR: Nvidia is partnering with 17 major companies to build a platform specifically for enterprise AI agents, basically trying to become ...

Reddit - Artificial Intelligence · 1 min ·
More in Ai Agents: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime