[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2511.10262: MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Computer Science > Computation and Language arXiv:2511.10262 (cs) [Submitted on 13 Nov 2025 (v1), last revised 17 Apr 2026 (this version, v3)] Title:MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models Authors:He Zhang, Wenqian Cui, Haoning Xu, Xiaohui Li, Lei Zhu, Haoli Bai, Shaohua Ma, Irwin King View a PDF of the paper titled MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models, by He Zhang and Wenqian Cui and Haoning Xu and Xiaohui Li and Lei Zhu and Haoli Bai and Shaohua Ma and Irwin King View PDF HTML (experimental) Abstract:Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. However, existing benchmarks primarily focus on evaluating single-round interactions, neglecting the complexities of multi-round communication. Evaluating FD-SLMs in multi-round settings poses significant challenges, including blurred turn boundaries in communication and context inconsistency during model inference. Also, existing benchmarks often focus solely on evaluating conversational features, neglecting other critical aspects. To address these gaps, we introduce MTR-DuplexBench, a novel benchmark designed for a comprehensive multi-round evaluation of FD-SLMs. MTR-DuplexBench not only segments continuous full-duplex dialogues...

Originally published on April 20, 2026. Curated by AI News.

Related Articles

Llms

AI research is splitting into groups that can train and groups that can only fine tune

I strongly believe that compute access is doing more to shape AI progress right now than any algorithmic insight - not because ideas don'...

Reddit - Artificial Intelligence · 1 min ·
Is Remitly (RELY) Embedding Transfers in ChatGPT a Turning Point for AI-Driven Customer Acquisition?
Llms

Is Remitly (RELY) Embedding Transfers in ChatGPT a Turning Point for AI-Driven Customer Acquisition?

Earlier this month, Remitly Global launched an app within ChatGPT, becoming the first cross-border money transfer provider on the platfor...

AI Tools & Products · 4 min ·
Llms

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most jo...

Reddit - Machine Learning · 1 min ·
[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction
Llms

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Abstract page for arXiv paper 2602.07303: KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime