[2604.09450] ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
About this article
Abstract page for arXiv paper 2604.09450: ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Computer Science > Machine Learning arXiv:2604.09450 (cs) [Submitted on 10 Apr 2026] Title:ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion Authors:Lifeng Chen, Tianqi You, Hao Liu, Zhimin Bao, Jile Jiao, Xiao Han, Zhicai Ou, Tao Sun, Xiaofeng Mou, Xiaojie Jin, Yi Xu View a PDF of the paper titled ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion, by Lifeng Chen and 10 other authors View PDF HTML (experimental) Abstract:Chest X-ray report generation (CXR-RG) has the potential to substantially alleviate radiologists' workload. However, conventional autoregressive vision--language models (VLMs) suffer from high inference latency due to sequential token decoding. Diffusion-based models offer a promising alternative through parallel generation, but they still require multiple denoising iterations. Compressing multi-step denoising to a single step could further reduce latency, but often degrades textual coherence due to the mean-field bias introduced by token-factorized denoisers. To address this challenge, we propose \textbf{ECHO}, an efficient diffusion-based VLM (dVLM) for chest X-ray report generation. ECHO enables stable one-step-per-block inference via a novel Direct Conditional Distillation (DCD) framework, which mitigates the mean-field limitation by constructing unfactorized supervision from on-policy diffusion trajectories to encode joint token dependencies. In addition, we introduce a Response-Asymmetric Diffusion ...