[2602.22213] Enriching Taxonomies Using Large Language Models
Summary
The paper presents Taxoria, a novel pipeline that enhances existing taxonomies using Large Language Models (LLMs), addressing issues of limited coverage and outdated nodes.
Why It Matters
Taxonomies are crucial for effective knowledge retrieval across various domains. By leveraging LLMs, Taxoria improves the quality and relevance of taxonomies, which can significantly enhance information organization and retrieval processes in fields like AI and data science.
Key Takeaways
- Taxoria enhances existing taxonomies by using LLMs to propose new nodes.
- The pipeline validates candidate nodes to reduce errors and ensure relevance.
- The enriched taxonomy includes provenance tracking for better analysis.
Computer Science > Information Retrieval arXiv:2602.22213 (cs) [Submitted on 21 Nov 2025] Title:Enriching Taxonomies Using Large Language Models Authors:Zeinab Ghamlouch, Mehwish Alam View a PDF of the paper titled Enriching Taxonomies Using Large Language Models, by Zeinab Ghamlouch and 1 other authors View PDF HTML (experimental) Abstract:Taxonomies play a vital role in structuring and categorizing information across domains. However, many existing taxonomies suffer from limited coverage and outdated or ambiguous nodes, reducing their effectiveness in knowledge retrieval. To address this, we present Taxoria, a novel taxonomy enrichment pipeline that leverages Large Language Models (LLMs) to enhance a given taxonomy. Unlike approaches that extract internal LLM taxonomies, Taxoria uses an existing taxonomy as a seed and prompts an LLM to propose candidate nodes for enrichment. These candidates are then validated to mitigate hallucinations and ensure semantic relevance before integration. The final output includes an enriched taxonomy with provenance tracking and visualization of the final merged taxonomy for analysis. Comments: Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2602.22213 [cs.IR] (or arXiv:2602.22213v1 [cs.IR] for this version) https://doi.org/10.48550/arXiv.2602.22213 Focus to learn more arXiv-issued DOI via DataCite Journal reference: FAIA 2025 5147-5150 (2025) Related DOI: https:...