[2602.17829] Causality by Abstraction: Symbolic Rule Learning in Multivariate Timeseries with Large Language Models
Summary
This paper introduces ruleXplain, a framework utilizing Large Language Models to extract causal rules from multivariate timeseries data, addressing the challenges of inferring causal relationships in complex systems.
Why It Matters
Understanding causal relationships in timeseries data is crucial for various fields, including epidemiology and energy management. This work enhances interpretability in machine learning models, providing a structured approach to derive meaningful insights from complex datasets, which can improve decision-making processes in real-world applications.
Key Takeaways
- ruleXplain leverages LLMs to generate symbolic causal rules from timeseries data.
- The framework uses a constrained symbolic rule language with temporal operators for better interpretability.
- Validation experiments demonstrate the efficacy of the ruleset in reconstructing inputs and generalizing across unseen trends.
Computer Science > Machine Learning arXiv:2602.17829 (cs) [Submitted on 19 Feb 2026] Title:Causality by Abstraction: Symbolic Rule Learning in Multivariate Timeseries with Large Language Models Authors:Preetom Biswas, Giulia Pedrielli, K. Selçuk Candan View a PDF of the paper titled Causality by Abstraction: Symbolic Rule Learning in Multivariate Timeseries with Large Language Models, by Preetom Biswas and 2 other authors View PDF HTML (experimental) Abstract:Inferring causal relations in timeseries data with delayed effects is a fundamental challenge, especially when the underlying system exhibits complex dynamics that cannot be captured by simple functional mappings. Traditional approaches often fail to produce generalized and interpretable explanations, as multiple distinct input trajectories may yield nearly indistinguishable outputs. In this work, we present ruleXplain, a framework that leverages Large Language Models (LLMs) to extract formal explanations for input-output relations in simulation-driven dynamical systems. Our method introduces a constrained symbolic rule language with temporal operators and delay semantics, enabling LLMs to generate verifiable causal rules through structured prompting. ruleXplain relies on the availability of a principled model (e.g., a simulator) that maps multivariate input time series to output time series. Within ruleXplain, the simulator is used to generate diverse counterfactual input trajectories that yield similar target output...