[2510.07978] VoiceAgentBench: Are Voice Assistants ready for agentic tasks?

[2510.07978] VoiceAgentBench: Are Voice Assistants ready for agentic tasks?

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces VoiceAgentBench, a benchmark for evaluating voice assistants' capabilities in agentic tasks, highlighting their performance and limitations across various languages.

Why It Matters

As voice assistants become integral to daily tasks, understanding their effectiveness in complex scenarios is crucial. This research addresses gaps in current evaluation methods, providing a framework for assessing their performance in real-world applications, particularly in multilingual contexts.

Key Takeaways

  • VoiceAgentBench evaluates voice assistants in realistic spoken settings.
  • ASR-LLM pipelines outperform end-to-end SpeechLMs in agentic tasks.
  • Performance varies significantly across languages, with challenges in Indic languages.
  • Sequential workflows and safety evaluations reveal persistent limitations.
  • The benchmark is publicly available, promoting further research and development.

Computer Science > Artificial Intelligence arXiv:2510.07978 (cs) [Submitted on 9 Oct 2025 (v1), last revised 13 Feb 2026 (this version, v3)] Title:VoiceAgentBench: Are Voice Assistants ready for agentic tasks? Authors:Dhruv Jain, Harshit Shukla, Gautam Rajeev, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal View a PDF of the paper titled VoiceAgentBench: Are Voice Assistants ready for agentic tasks?, by Dhruv Jain and 5 other authors View PDF HTML (experimental) Abstract:Large scale Speech Language Models have enabled voice assistants capable of understanding natural spoken queries and performing complex tasks. However, existing speech benchmarks largely focus on isolated capabilities such as transcription or question answering and do not systematically evaluate agentic behavior or adversarial robustness. To address this, we introduce VoiceAgentBench, a comprehensive benchmark for evaluating SpeechLMs in realistic spoken agentic settings, comprising 6,000+ synthetic spoken queries spanning single-tool invocations, multi-tool workflows, multi-turn dialogue, and safety evaluations across English and six Indic languages. To ensure speaker diversity, we further simulate speaker variability using a novel sampling strategy that selects audios for TTS voice conversion based on speaker embeddings to maximize acoustic diversity. Our evaluation measures tool selection accuracy, structural consistency, and the correctness of tool invocations, including adversarial robustness. Across...

Related Articles

[2603.29171] Segmentation of Gray Matters and White Matters from Brain MRI data
Llms

[2603.29171] Segmentation of Gray Matters and White Matters from Brain MRI data

Abstract page for arXiv paper 2603.29171: Segmentation of Gray Matters and White Matters from Brain MRI data

arXiv - Machine Learning · 4 min ·
[2602.09924] LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations
Llms

[2602.09924] LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

Abstract page for arXiv paper 2602.09924: LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

arXiv - AI · 3 min ·
[2602.01528] Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning
Llms

[2602.01528] Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Abstract page for arXiv paper 2602.01528: Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

arXiv - Machine Learning · 4 min ·
[2601.22783] Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval
Llms

[2601.22783] Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

Abstract page for arXiv paper 2601.22783: Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime