[2602.20497] LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration
Summary
The paper introduces LESA, a framework for accelerating diffusion models using learnable stage-aware predictors, achieving significant speedups while maintaining high-quality outputs.
Why It Matters
As diffusion models gain traction in image and video generation, optimizing their computational efficiency is crucial for practical applications. The LESA framework addresses this challenge by enhancing performance without compromising quality, making it relevant for researchers and practitioners in AI and computer vision.
Key Takeaways
- LESA utilizes a two-stage training approach to optimize diffusion model performance.
- The framework achieves up to 6.25x acceleration with improved quality metrics over existing methods.
- Specialized predictors are employed for different noise levels, enhancing feature forecasting accuracy.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.20497 (cs) [Submitted on 24 Feb 2026] Title:LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration Authors:Peiliang Cai, Jiacheng Liu, Haowen Xu, Xinyu Wang, Chang Zou, Linfeng Zhang View a PDF of the paper titled LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration, by Peiliang Cai and 5 other authors View PDF HTML (experimental) Abstract:Diffusion models have achieved remarkable success in image and video generation tasks. However, the high computational demands of Diffusion Transformers (DiTs) pose a significant challenge to their practical deployment. While feature caching is a promising acceleration strategy, existing methods based on simple reusing or training-free forecasting struggle to adapt to the complex, stage-dependent dynamics of the diffusion process, often resulting in quality degradation and failing to maintain consistency with the standard denoising process. To address this, we propose a LEarnable Stage-Aware (LESA) predictor framework based on two-stage training. Our approach leverages a Kolmogorov-Arnold Network (KAN) to accurately learn temporal feature mappings from data. We further introduce a multi-stage, multi-expert architecture that assigns specialized predictors to different noise-level stages, enabling more precise and robust feature forecasting. Extensive experiments show our method achieves significant acceleration while maintaining high-...