[2601.20775] Active Learning for Decision Trees with Provable Guarantees
Summary
This paper explores active learning for decision trees, presenting a new algorithm that achieves polylogarithmic label complexity with provable guarantees, enhancing binary classification efficiency.
Why It Matters
Understanding active learning in decision trees is crucial for improving machine learning models' efficiency. This research provides foundational insights that can lead to better algorithms, reducing the amount of labeled data needed for training, which is especially valuable in data-scarce environments.
Key Takeaways
- Introduces the first analysis of the disagreement coefficient for decision trees.
- Presents a new active learning algorithm achieving a $(1+B5)$-approximate classifier.
- Establishes a label complexity lower bound, demonstrating optimal dependence on error tolerance.
Computer Science > Machine Learning arXiv:2601.20775 (cs) [Submitted on 28 Jan 2026 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Active Learning for Decision Trees with Provable Guarantees Authors:Arshia Soltani Moakhar, Tanapoom Laoaron, Faraz Ghahremani, Kiarash Banihashem, MohammadTaghi Hajiaghayi View a PDF of the paper titled Active Learning for Decision Trees with Provable Guarantees, by Arshia Soltani Moakhar and Tanapoom Laoaron and Faraz Ghahremani and Kiarash Banihashem and MohammadTaghi Hajiaghayi View PDF HTML (experimental) Abstract:This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1+\epsilon)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a polyloga...