[2410.13331] Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

[2410.13331] Improving Discrete Optimisation Via Decoupled Straight-Through Estimator

arXiv - AI 4 min read Article

Summary

The paper presents the Decoupled Straight-Through Estimator (Decoupled ST), a new method for optimizing discrete variables in neural networks, improving performance by separating forward and backward pass parameters.

Why It Matters

This research addresses a critical limitation in existing optimization methods for discrete variables in neural networks. By decoupling the forward and backward pass parameters, it allows for more effective tuning, potentially leading to significant advancements in various machine learning applications.

Key Takeaways

  • Decoupled ST introduces separate temperatures for forward and backward passes, enhancing optimization.
  • The method consistently outperforms traditional STE variants across multiple tasks.
  • Optimal configurations for forward and backward passes differ significantly, indicating the need for decoupling.

Computer Science > Machine Learning arXiv:2410.13331 (cs) [Submitted on 17 Oct 2024 (v1), last revised 22 Feb 2026 (this version, v2)] Title:Improving Discrete Optimisation Via Decoupled Straight-Through Estimator Authors:Rushi Shah, Mingyuan Yan, Michael Curtis Mozer, Dianbo Liu View a PDF of the paper titled Improving Discrete Optimisation Via Decoupled Straight-Through Estimator, by Rushi Shah and 2 other authors View PDF HTML (experimental) Abstract:The Straight-Through Estimator (STE) is the dominant method for training neural networks with discrete variables, enabling gradient-based optimisation by routing gradients through a differentiable surrogate. However, existing STE variants conflate two fundamentally distinct concerns: forward-pass stochasticity, which controls exploration and latent space utilisation, and backward-pass gradient dispersion i.e how learning signals are distributed across categories. We show that these concerns are qualitatively different and that tying them to a single temperature parameter leaves significant performance gains untapped. We propose Decoupled Straight-Through (Decoupled ST), a minimal modification that introduces separate temperatures for the forward pass ($\tau_f$) and the backward pass ($\tau_b$). This simple change enables independent tuning of exploration and gradient dispersion. Across three diverse tasks (Stochastic Binary Networks, Categorical Autoencoders, and Differentiable Logic Gate Networks), Decoupled ST consistentl...

Related Articles

Machine Learning

[P] MCGrad: fix calibration of your ML model in subgroups

Hi r/MachineLearning, We’re open-sourcing MCGrad, a Python package for multicalibration–developed and deployed in production at Meta. Thi...

Reddit - Machine Learning · 1 min ·
Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of th...

Reddit - Machine Learning · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime