[2510.14190] Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation
Summary
The paper presents Contrastive Diffusion Alignment (ConDA), a method that enhances the interpretability and control of diffusion models by organizing high-dimensional latent spaces through contrastive learning.
Why It Matters
This work is significant as it addresses the challenge of high-dimensional latent spaces in diffusion models, providing a structured approach to enhance controllability and interpretability in generative tasks. It has implications for various fields, including fluid dynamics and neuroscience, where understanding latent structures can lead to better model performance and insights.
Key Takeaways
- ConDA introduces a geometry layer that applies contrastive learning to diffusion latents.
- It enables low-dimensional embeddings that align with underlying dynamical factors.
- The method supports smooth interpolation and counterfactual editing while maintaining rendering in the original diffusion space.
- ConDA outperforms traditional linear traversals and conditioning-based methods in interpretability and control.
- The approach is robust across various applications, including neural imaging and motor cortex activity.
Computer Science > Machine Learning arXiv:2510.14190 (cs) [Submitted on 16 Oct 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation Authors:Ruchi Sandilya, Sumaira Perez, Charles Lynch, Lindsay Victoria, Benjamin Zebley, Derrick Matthew Buchanan, Mahendra T. Bhati, Nolan Williams, Timothy J. Spellman, Faith M. Gunning, Conor Liston, Logan Grosenick View a PDF of the paper titled Contrastive Diffusion Alignment: Learning Structured Latents for Controllable Generation, by Ruchi Sandilya and 11 other authors View PDF HTML (experimental) Abstract:Diffusion models excel at generation, but their latent spaces are high dimensional and not explicitly organized for interpretation or control. We introduce ConDA (Contrastive Diffusion Alignment), a plug-and-play geometry layer that applies contrastive learning to pretrained diffusion latents using auxiliary variables (e.g., time, stimulation parameters, facial action units). ConDA learns a low-dimensional embedding whose directions align with underlying dynamical factors, consistent with recent contrastive learning results on structured and disentangled representations. In this embedding, simple nonlinear trajectories support smooth interpolation, extrapolation, and counterfactual editing while rendering remains in the original diffusion space. ConDA separates editing and rendering by lifting embedding trajectories back to diffusion late...