[2602.14257] AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents

[2602.14257] AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents

arXiv - AI 4 min read Article

Summary

The paper introduces AD-Bench, a benchmark for evaluating Large Language Model (LLM) agents in real-world advertising analytics, highlighting performance gaps in complex tasks.

Why It Matters

As LLMs become integral in various domains, understanding their performance in real-world scenarios, especially in advertising, is crucial. AD-Bench addresses the limitations of existing benchmarks by focusing on practical applications, thus enabling better evaluation and improvement of LLM capabilities in marketing contexts.

Key Takeaways

  • AD-Bench is designed to evaluate LLM agents in real-world advertising scenarios.
  • The benchmark categorizes tasks into three difficulty levels to assess agent capabilities.
  • Current state-of-the-art models show significant performance gaps in complex marketing tasks.

Computer Science > Computation and Language arXiv:2602.14257 (cs) [Submitted on 15 Feb 2026] Title:AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents Authors:Lingxiang Hu, Yiding Sun, Tianle Xia, Wenwei Li, Ming Xu, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang View a PDF of the paper titled AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents, by Lingxiang Hu and 8 other authors View PDF HTML (experimental) Abstract:While Large Language Model (LLM) agents have achieved remarkable progress in complex reasoning tasks, evaluating their performance in real-world environments has become a critical problem. Current benchmarks, however, are largely restricted to idealized simulations, failing to address the practical demands of specialized domains like advertising and marketing analytics. In these fields, tasks are inherently more complex, often requiring multi-round interaction with professional marketing tools. To address this gap, we propose AD-Bench, a benchmark designed based on real-world business requirements of advertising and marketing platforms. AD-Bench is constructed from real user marketing analysis requests, with domain experts providing verifiable reference answers and corresponding reference tool-call trajectories. The benchmark categorizes requests into three difficulty levels (L1-L3) to evaluate agents' capabilities under multi-round, multi-tool collaboration. Experiments show that on AD-Bench, G...

Related Articles

Llms

Ai tools for studies

I am considering to buy a paid version (permium) of an Ai tool. I feel like Chatgpt is very general. Can u guys recommad me an ai which i...

Reddit - Artificial Intelligence · 1 min ·
Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI can answer your questions with 3D models and simulations
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min ·
Moody’s Integrates AI Agents With Anthropic’s Claude
Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime