[2602.14456] Traceable Latent Variable Discovery Based on Multi-Agent Collaboration
Summary
The paper presents TLVD, a novel causal modeling framework that integrates large language models with traditional causal discovery algorithms to enhance latent variable inference.
Why It Matters
Understanding latent variables is crucial for advancing scientific research and technology. TLVD addresses limitations in traditional causal discovery methods, potentially leading to more accurate models and insights in various fields, including healthcare and social sciences.
Key Takeaways
- TLVD combines metadata reasoning from LLMs with data-driven causal modeling.
- The framework improves latent variable inference by modeling it as a game with incomplete information.
- Validation of inferred variables is achieved through evidence exploration across real-world datasets.
- Experimental results show significant performance improvements over traditional methods.
- The approach has potential applications in various domains requiring causal analysis.
Computer Science > Machine Learning arXiv:2602.14456 (cs) [Submitted on 16 Feb 2026] Title:Traceable Latent Variable Discovery Based on Multi-Agent Collaboration Authors:Huaming Du, Tao Hu, Yijie Huang, Yu Zhao, Guisong Liu, Tao Gu, Gang Kou, Carl Yang View a PDF of the paper titled Traceable Latent Variable Discovery Based on Multi-Agent Collaboration, by Huaming Du and 7 other authors View PDF HTML (experimental) Abstract:Revealing the underlying causal mechanisms in the real world is crucial for scientific and technological progress. Despite notable advances in recent decades, the lack of high-quality data and the reliance of traditional causal discovery algorithms (TCDA) on the assumption of no latent confounders, as well as their tendency to overlook the precise semantics of latent variables, have long been major obstacles to the broader application of causal discovery. To address this issue, we propose a novel causal modeling framework, TLVD, which integrates the metadata-based reasoning capabilities of large language models (LLMs) with the data-driven modeling capabilities of TCDA for inferring latent variables and their semantics. Specifically, we first employ a data-driven approach to construct a causal graph that incorporates latent variables. Then, we employ multi-LLM collaboration for latent variable inference, modeling this process as a game with incomplete information and seeking its Bayesian Nash Equilibrium (BNE) to infer the possible specific latent variab...