[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv - AI April 20, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.10262: MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Computer Science > Computation and Language arXiv:2511.10262 (cs) [Submitted on 13 Nov 2025 (v1), last revised 17 Apr 2026 (this version, v3)] Title:MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models Authors:He Zhang, Wenqian Cui, Haoning Xu, Xiaohui Li, Lei Zhu, Haoli Bai, Shaohua Ma, Irwin King View a PDF of the paper titled MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models, by He Zhang and Wenqian Cui and Haoning Xu and Xiaohui Li and Lei Zhu and Haoli Bai and Shaohua Ma and Irwin King View PDF HTML (experimental) Abstract:Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. However, existing benchmarks primarily focus on evaluating single-round interactions, neglecting the complexities of multi-round communication. Evaluating FD-SLMs in multi-round settings poses significant challenges, including blurred turn boundaries in communication and context inconsistency during model inference. Also, existing benchmarks often focus solely on evaluating conversational features, neglecting other critical aspects. To address these gaps, we introduce MTR-DuplexBench, a novel benchmark designed for a comprehensive multi-round evaluation of FD-SLMs. MTR-DuplexBench not only segments continuous full-duplex dialogues...

Originally published on April 20, 2026. Curated by AI News.

Llms

AI research is splitting into groups that can train and groups that can only fine tune

I strongly believe that compute access is doing more to shape AI progress right now than any algorithmic insight - not because ideas don'...

Reddit - Artificial Intelligence · 1 min · 22 minutes ago

Llms

Is Remitly (RELY) Embedding Transfers in ChatGPT a Turning Point for AI-Driven Customer Acquisition?

Earlier this month, Remitly Global launched an app within ChatGPT, becoming the first cross-border money transfer provider on the platfor...

AI Tools & Products · 4 min · about 1 hour ago

Llms

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most jo...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Abstract page for arXiv paper 2602.07303: KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

arXiv - AI · 4 min · about 2 hours ago

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

About this article

Related Articles

AI research is splitting into groups that can train and groups that can only fine tune

Is Remitly (RELY) Embedding Transfers in ChatGPT a Turning Point for AI-Driven Customer Acquisition?

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

No comments

Stay updated with AI News