[2602.20224] Exploring Anti-Aging Literature via ConvexTopics and Large Language Models
Summary
This article presents a novel clustering algorithm for analyzing anti-aging literature, improving topic modeling through convex optimization and large language models, and demonstrating enhanced reproducibility and interpretability.
Why It Matters
The rapid growth of biomedical research necessitates effective methods for organizing and interpreting vast amounts of data. This study addresses the limitations of traditional clustering methods, offering a more reliable approach to uncovering trends in anti-aging literature, which is crucial for advancing research in this field.
Key Takeaways
- Introduces a convex optimization-based clustering algorithm for topic modeling.
- Demonstrates improved stability and interpretability compared to traditional methods like K-means and LDA.
- Analyzes 12,000 PubMed articles, revealing significant topics in anti-aging research.
- Provides a foundation for scalable tools in biomedical knowledge discovery.
- Highlights the importance of reproducibility in scientific research.
Computer Science > Machine Learning arXiv:2602.20224 (cs) [Submitted on 23 Feb 2026] Title:Exploring Anti-Aging Literature via ConvexTopics and Large Language Models Authors:Lana E. Yeganova, Won G. Kim, Shubo Tian, Natalie Xie, Donald C. Comeau, W. John Wilbur, Zhiyong Lu View a PDF of the paper titled Exploring Anti-Aging Literature via ConvexTopics and Large Language Models, by Lana E. Yeganova and 6 other authors View PDF Abstract:The rapid expansion of biomedical publications creates challenges for organizing knowledge and detecting emerging trends, underscoring the need for scalable and interpretable methods. Common clustering and topic modeling approaches such as K-means or LDA remain sensitive to initialization and prone to local optima, limiting reproducibility and evaluation. We propose a reformulation of a convex optimization based clustering algorithm that produces stable, fine-grained topics by selecting exemplars from the data and guaranteeing a global optimum. Applied to about 12,000 PubMed articles on aging and longevity, our method uncovers topics validated by medical experts. It yields interpretable topics spanning from molecular mechanisms to dietary supplements, physical activity, and gut microbiota. The method performs favorably, and most importantly, its reproducibility and interpretability distinguish it from common clustering approaches, including K-means, LDA, and BERTopic. This work provides a basis for developing scalable, web-accessible tools fo...