Machine Learning Ai Agents

[2410.13331] Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

arXiv - AI February 24, 2026 4 min read Article

Summary

The paper presents the Decoupled Straight-Through Estimator (Decoupled ST), a new method for optimizing discrete variables in neural networks, improving performance by separating forward and backward pass parameters.

Why It Matters

This research addresses a critical limitation in existing optimization methods for discrete variables in neural networks. By decoupling the forward and backward pass parameters, it allows for more effective tuning, potentially leading to significant advancements in various machine learning applications.

Key Takeaways

Decoupled ST introduces separate temperatures for forward and backward passes, enhancing optimization.
The method consistently outperforms traditional STE variants across multiple tasks.
Optimal configurations for forward and backward passes differ significantly, indicating the need for decoupling.

Computer Science > Machine Learning arXiv:2410.13331 (cs) [Submitted on 17 Oct 2024 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Improving Discrete Optimisation Via Decoupled Straight-Through Estimator Authors:Rushi Shah, Mingyuan Yan, Michael Curtis Mozer, Dianbo Liu View a PDF of the paper titled Improving Discrete Optimisation Via Decoupled Straight-Through Estimator, by Rushi Shah and 2 other authors View PDF HTML (experimental) Abstract:The Straight-Through Estimator (STE) is the dominant method for training neural networks with discrete variables, enabling gradient-based optimisation by routing gradients through a differentiable surrogate. However, existing STE variants conflate two fundamentally distinct concerns: forward-pass stochasticity, which controls exploration and latent space utilisation, and backward-pass gradient dispersion i.e how learning signals are distributed across categories. We show that these concerns are qualitatively different and that tying them to a single temperature parameter leaves significant performance gains untapped. We propose Decoupled Straight-Through (Decoupled ST), a minimal modification that introduces separate temperatures for the forward pass ($\tau_f$) and the backward pass ($\tau_b$). This simple change enables independent tuning of exploration and gradient dispersion. Across three diverse tasks (Stochastic Binary Networks, Categorical Autoencoders, and Differentiable Logic Gate Networks), Decoupled ST consistentl...

Read Original Article

[2410.13331] Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

Summary

Why It Matters

Key Takeaways

Related Articles

[P] MCGrad: fix calibration of your ML model in subgroups

Ml project user give dataset and I give best model [D] [P]

[D] ICML Reviewer Acknowledgement

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

No comments

Stay updated with AI News