[2410.01469] TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
About this article
Abstract page for arXiv paper 2410.01469: TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Computer Science > Sound arXiv:2410.01469 (cs) [Submitted on 2 Oct 2024 (v1), last revised 27 Feb 2026 (this version, v3)] Title:TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation Authors:Mohan Xu, Kai Li, Guo Chen, Xiaolin Hu View a PDF of the paper titled TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation, by Mohan Xu and 3 other authors View PDF HTML (experimental) Abstract:In recent years, much speech separation research has focused primarily on improving model performance. However, for low-latency speech processing systems, high efficiency is equally important. Therefore, we propose a speech separation model with significantly reduced parameters and computational costs: Time-frequency Interleaved Gain Extraction and Reconstruction network (TIGER). TIGER leverages prior knowledge to divide frequency bands and compresses frequency information. We employ a multi-scale selective attention module to extract contextual features while introducing a full-frequency-frame attention module to capture both temporal and frequency contextual information. Additionally, to more realistically evaluate the performance of speech separation models in complex acoustic environments, we introduce a dataset called EchoSet. This dataset includes noise and more realistic reverberation (e.g., considering object occlusions and material properties), with speech from two speakers overlapping at rand...