[2602.14462] Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment
Summary
This paper explores 'silent inconsistency' in data-parallel fine-tuning of large language models, identifying optimization misalignments among workers and proposing a diagnostic framework to enhance training reliability.
Why It Matters
Understanding worker-level optimization dynamics is crucial for improving the efficiency and reliability of large-scale training processes in machine learning. This research addresses a significant gap in monitoring and diagnosing issues that can arise during fine-tuning, which can lead to suboptimal model performance.
Key Takeaways
- Identifies 'silent inconsistency' as a key issue in data-parallel fine-tuning.
- Proposes a diagnostic framework with three metrics for assessing worker-level optimization.
- Demonstrates that conventional monitoring may overlook critical divergence in training dynamics.
- Validates the framework through experiments on a large language model.
- Highlights the importance of visibility into hidden instability modes during training.
Computer Science > Machine Learning arXiv:2602.14462 (cs) [Submitted on 16 Feb 2026] Title:Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment Authors:Hong Li, Zhen Zhou, Honggang Zhang, Yuping Luo, Xinyue Wang, Han Gong, Zhiyuan Liu View a PDF of the paper titled Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment, by Hong Li and 6 other authors View PDF HTML (experimental) Abstract:Data-parallel (DP) training with synchronous all-reduce is a dominant paradigm for full-parameter fine-tuning of large language models (LLMs). While parameter synchronization guarantees numerical equivalence of model weights after each iteration, it does not necessarily imply alignment of worker-level optimization dynamics before gradient aggregation. This paper identifies and studies this latent mismatch, termed \emph{silent inconsistency}, where cross-worker divergence in losses and gradients can remain invisible under conventional aggregated monitoring signals. We propose a lightweight, model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines. Specifically, we introduce three complementary metrics: loss dispersion, gradient-norm dispersion, and gradient-direction consistency measured by inter-worker cosine similarity. The proposed metrics incur negligible overhead and require no modification to model ar...