[2511.20224] DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

[2511.20224] DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2511.20224: DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

Computer Science > Sound arXiv:2511.20224 (cs) [Submitted on 25 Nov 2025 (v1), last revised 1 Apr 2026 (this version, v2)] Title:DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling Authors:Rui Lin, Zhiyue Wu, Jiahe Le, Kangdi Wang, Weixiong Chen, Junyu Dai, Tao Jiang View a PDF of the paper titled DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling, by Rui Lin and 6 other authors View PDF HTML (experimental) Abstract:Audio tokenization bridges continuous waveforms and multi-track music language models. In dual-track modeling, tokens should preserve three properties at once: high-fidelity reconstruction, strong predictability under a language model, and cross-track correspondence. We introduce DuoTok, a source-aware dual-track tokenizer that addresses this trade-off through staged disentanglement. DuoTok first pretrains a semantic encoder, then regularizes it with multi-task supervision, freezes the encoder, and applies hard dual-codebook routing while keeping auxiliary objectives on quantized codes. A diffusion decoder reconstructs high-frequency details, allowing tokens to focus on structured information for sequence modeling. On standard benchmarks, DuoTok achieves a favorable predictability-fidelity trade-off, reaching the lowest cnBPT while maintaining competitive reconstruction at 0.75 kbps. Under a held-constant dual-track language modeling protocol, enBPT also improves, indicating gains beyond codeboo...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

[2512.02966] Lumos: Let there be Language Model System Certification
Llms

[2512.02966] Lumos: Let there be Language Model System Certification

Abstract page for arXiv paper 2512.02966: Lumos: Let there be Language Model System Certification

arXiv - AI · 4 min ·
[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections
Llms

[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections

Abstract page for arXiv paper 2602.00750: Bypassing Prompt Injection Detectors through Evasive Injections

arXiv - AI · 4 min ·
[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback
Llms

[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

Abstract page for arXiv paper 2511.08225: Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

arXiv - AI · 4 min ·
[2506.09354] "Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions
Llms

[2506.09354] "Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions

Abstract page for arXiv paper 2506.09354: "Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in ...

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime