Machine Learning Data Science Ai Startups Ai Agents

[2602.14795] Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs

arXiv - Machine Learning February 17, 2026 4 min read Article

Summary

This paper presents a novel resource for building complete datasets that integrate schema and ground facts for machine learning and reasoning on knowledge graphs.

Why It Matters

The integration of schema knowledge into datasets enhances the evaluation of knowledge graph refinement algorithms, enabling more effective machine learning and reasoning applications. This work addresses a significant gap in current methodologies, facilitating better performance assessment in real-world scenarios.

Key Takeaways

Introduces a workflow for extracting datasets that include both schema and ground facts.
Addresses inconsistencies in datasets while leveraging reasoning for implicit knowledge.
Provides datasets serialized in OWL format for compatibility with reasoning services.
Enriches existing datasets with schema information to improve machine learning applications.
Facilitates the use of tensor representations for standard machine learning libraries.

Computer Science > Artificial Intelligence arXiv:2602.14795 (cs) [Submitted on 16 Feb 2026] Title:Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs Authors:Ivan Diliso, Roberto Barile, Claudia d'Amato, Nicola Fanizzi View a PDF of the paper titled Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs, by Ivan Diliso and 3 other authors View PDF HTML (experimental) Abstract:Datasets for the experimental evaluation of knowledge graph refinement algorithms typically contain only ground facts, retaining very limited schema level knowledge even when such information is available in the source knowledge graphs. This limits the evaluation of methods that rely on rich ontological constraints, reasoning or neurosymbolic techniques and ultimately prevents assessing their performance in large-scale, real-world knowledge graphs. In this paper, we present \resource{} the first resource that provides a workflow for extracting datasets including both schema and ground facts, ready for machine learning and reasoning services, along with the resulting curated suite of datasets. The workflow also handles inconsistencies detected when keeping both schema and facts and also leverage reasoning for entailing implicit knowledge. The suite includes newly extracted datasets from KGs with expressive schemas while simultaneously enriching existing datasets with schema information. Each dataset is ...

Read Original Article

[2602.14795] Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs

Summary

Why It Matters

Key Takeaways

Related Articles

[P] MCGrad: fix calibration of your ML model in subgroups

Ml project user give dataset and I give best model [D] [P]

[D] ICML Reviewer Acknowledgement

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

No comments

Stay updated with AI News