[2602.15330] A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining
Summary
This paper presents a novel Curiosity-Driven Game-Theoretic framework for addressing long-tail multi-label learning challenges in data mining, enhancing model performance on rare labels.
Why It Matters
The long-tail distribution in data sets complicates multi-label classification, particularly in real-world applications. This framework not only improves accuracy for underrepresented labels but also offers a scalable solution that can adapt to various industries, including e-commerce and healthcare.
Key Takeaways
- Introduces a game-theoretic approach to multi-label classification.
- Enhances learning for rare labels without manual tuning.
- Demonstrates superior performance on benchmarks with thousands of labels.
- Links theoretical analysis to practical improvements in classification metrics.
- Paves the way for adaptive learning in imbalanced data scenarios.
Computer Science > Machine Learning arXiv:2602.15330 (cs) [Submitted on 17 Feb 2026] Title:A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining Authors:Jing Yang, Keze Wang View a PDF of the paper titled A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining, by Jing Yang and 1 other authors View PDF HTML (experimental) Abstract:The long-tail distribution, where a few head labels dominate while rare tail labels abound, poses a persistent challenge for large-scale Multi-Label Classification (MLC) in real-world data mining applications. Existing resampling and reweighting strategies often disrupt inter-label dependencies or require brittle hyperparameter tuning, especially as the label space expands to tens of thousands of labels. To address this issue, we propose Curiosity-Driven Game-Theoretic Multi-Label Learning (CD-GTMLL), a scalable cooperative framework that recasts long-tail MLC as a multi-player game - each sub-predictor ("player") specializes in a partition of the label space, collaborating to maximize global accuracy while pursuing intrinsic curiosity rewards based on tail label rarity and inter-player disagreement. This mechanism adaptively injects learning signals into under-represented tail labels without manual balancing or tuning. We further provide a theoretical analysis showing that our CD-GTMLL converges to a tail-aware equilibrium and formally links the optim...