[2509.20321] Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
About this article
Abstract page for arXiv paper 2509.20321: Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
Computer Science > Computation and Language arXiv:2509.20321 (cs) [Submitted on 24 Sep 2025 (v1), last revised 5 Mar 2026 (this version, v2)] Title:Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones Authors:Maria Teleki, Sai Janjur, Haoran Liu, Oliver Grabner, Ketan Verma, Thomas Docog, Xiangjue Dong, Lingfeng Shi, Cong Wang, Stephanie Birkelbach, Jason Kim, Yin Zhang, Éva Székely, James Caverlee View a PDF of the paper titled Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones, by Maria Teleki and 13 other authors View PDF HTML (experimental) Abstract:LLMs serve as the backbone in SpeechLLMs, yet their behavior on spontaneous conversational input remains poorly understood. Conversational speech contains pervasive disfluencies -- interjections, edits, and parentheticals -- that are rare in the written corpora used for pre-training. Because gold disfluency removal is a deletion-only task, it serves as a controlled probe to determine whether a model performs faithful structural repair or biased reinterpretation. Using the DRES evaluation framework, we evaluate proprietary and open-source LLMs across architectures and scales. We show that model performance clusters into stable precision-recall regimes reflecting distinct editing policies. Notably, reasoning models systematically over-delete fluent content, revealing a bias toward semantic abstraction over structural fidelity. While fine-tuning achieves SOTA resu...