[2602.22254] Causal Direction from Convergence Time: Faster Training in the True Causal Direction
Summary
This paper introduces Causal Computational Asymmetry (CCA), a method for identifying causal direction in neural networks based on convergence time, demonstrating faster training in the true causal direction through empirical validation.
Why It Matters
Understanding causal relationships is crucial in machine learning for improving model performance and interpretability. CCA provides a new approach that leverages optimization dynamics, potentially enhancing training efficiency and accuracy in various applications.
Key Takeaways
- CCA identifies causal direction by comparing convergence times of neural networks.
- The method shows that the forward causal direction converges faster than the reverse direction.
- Empirical results indicate high accuracy in causal identification across multiple neural architectures.
- CCA is distinct from traditional methods that rely on statistical independence.
- The framework integrates causal learning with graph structure and policy optimization.
Computer Science > Machine Learning arXiv:2602.22254 (cs) [Submitted on 24 Feb 2026] Title:Causal Direction from Convergence Time: Faster Training in the True Causal Direction Authors:Abdulrahman Tamim View a PDF of the paper titled Causal Direction from Convergence Time: Faster Training in the True Causal Direction, by Abdulrahman Tamim View PDF HTML (experimental) Abstract:We introduce Causal Computational Asymmetry (CCA), a principle for causal direction identification based on optimization dynamics in which one neural network is trained to predict $Y$ from $X$ and another to predict $X$ from $Y$, and the direction that converges faster is inferred to be causal. Under the additive noise model $Y = f(X) + \varepsilon$ with $\varepsilon \perp X$ and $f$ nonlinear and injective, we establish a formal asymmetry: in the reverse direction, residuals remain statistically dependent on the input regardless of approximation quality, inducing a strictly higher irreducible loss floor and non-separable gradient noise in the optimization dynamics, so that the reverse model requires strictly more gradient steps in expectation to reach any fixed loss threshold; consequently, the forward (causal) direction converges in fewer expected optimization steps. CCA operates in optimization-time space, distinguishing it from methods such as RESIT, IGCI, and SkewScore that rely on statistical independence or distributional asymmetries, and proper z-scoring of both variables is required for valid ...