[2603.00951] When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning
About this article
Abstract page for arXiv paper 2603.00951: When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning
Computer Science > Machine Learning arXiv:2603.00951 (cs) [Submitted on 1 Mar 2026] Title:When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning Authors:Joshua Steier View a PDF of the paper titled When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning, by Joshua Steier View PDF HTML (experimental) Abstract:Contrastive Forward-Forward (CFF) learning trains Vision Transformers layer by layer against supervised contrastive objectives. CFF training can be sensitive to random seed, but the sources of this instability are poorly understood. We focus on one implementation detail: the positive-pair margin in the contrastive loss is applied through saturating similarity clamping, $\min(s + m,\, 1)$. We prove that an alternative formulation, subtracting the margin after the log-probability, is gradient-neutral under the mean-over-positives reduction. On CIFAR-10 ($2 \times 2$ factorial, $n{=}7$ seeds per cell), clamping produces $5.90\times$ higher pooled test-accuracy variance ($p{=}0.003$) with no difference in mean accuracy. Analyses of clamp activation rates, layerwise gradient norms, and a reduced-margin probe point to saturation-driven gradient truncation at early layers. The effect does not transfer cleanly to other datasets: on CIFAR-100, SVHN, and Fashion-MNIST, clamping produces equal or lower variance. Two factors account for the discrepancy. Firs...