[2507.08333] Token-Based Audio Inpainting via Discrete Diffusion

[2507.08333] Token-Based Audio Inpainting via Discrete Diffusion

arXiv - AI 3 min read Article

Summary

This article presents a novel method for audio inpainting using discrete diffusion techniques to restore missing segments in audio recordings, particularly in music.

Why It Matters

Audio inpainting is crucial for enhancing audio quality in various applications, including music restoration and sound design. This research introduces a new approach that significantly improves the restoration of larger gaps in audio, which has implications for both academic research and practical applications in audio processing.

Key Takeaways

  • Introduces a token-based approach for audio inpainting using discrete diffusion.
  • Outperforms existing methods in restoring long gaps in audio recordings.
  • Incorporates innovative training techniques for improved audio restoration.

Computer Science > Sound arXiv:2507.08333 (cs) [Submitted on 11 Jul 2025 (v1), last revised 17 Feb 2026 (this version, v4)] Title:Token-Based Audio Inpainting via Discrete Diffusion Authors:Tali Dror, Iftach Shoham, Moshe Buchris, Oren Gal, Haim Permuter, Gilad Katz, Eliya Nachmani View a PDF of the paper titled Token-Based Audio Inpainting via Discrete Diffusion, by Tali Dror and 6 other authors View PDF HTML (experimental) Abstract:Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750 ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150 ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Visit our project page for examples and code. Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learn...

Related Articles

Google quietly releases an offline-first AI dictation app on iOS | TechCrunch
Machine Learning

Google quietly releases an offline-first AI dictation app on iOS | TechCrunch

Google's new offline-first dictation app uses Gemma AI models to take on the apps like Wispr Flow.

TechCrunch - AI · 4 min ·
Machine Learning

How well do you understand how AI/deep learning works?

Specifically, how AI are programmed, trained, and how they perform their functions. I’ll be asking this in different subs to see if/how t...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

a fun survey to look at how consumers perceive the use of AI in fashion brand marketing. (all ages, all genders)

Hi r/artificial ! I'm posting on behalf of a friend who is conducting academic research for their dissertation. The survey looks at how c...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

I Built a Functional Cognitive Engine

Aura: https://github.com/youngbryan97/aura Aura is not a chatbot with personality prompts. It is a complete cognitive architecture — 60+ ...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime