[2602.14795] Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs

[2602.14795] Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a novel resource for building complete datasets that integrate schema and ground facts for machine learning and reasoning on knowledge graphs.

Why It Matters

The integration of schema knowledge into datasets enhances the evaluation of knowledge graph refinement algorithms, enabling more effective machine learning and reasoning applications. This work addresses a significant gap in current methodologies, facilitating better performance assessment in real-world scenarios.

Key Takeaways

  • Introduces a workflow for extracting datasets that include both schema and ground facts.
  • Addresses inconsistencies in datasets while leveraging reasoning for implicit knowledge.
  • Provides datasets serialized in OWL format for compatibility with reasoning services.
  • Enriches existing datasets with schema information to improve machine learning applications.
  • Facilitates the use of tensor representations for standard machine learning libraries.

Computer Science > Artificial Intelligence arXiv:2602.14795 (cs) [Submitted on 16 Feb 2026] Title:Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs Authors:Ivan Diliso, Roberto Barile, Claudia d'Amato, Nicola Fanizzi View a PDF of the paper titled Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs, by Ivan Diliso and 3 other authors View PDF HTML (experimental) Abstract:Datasets for the experimental evaluation of knowledge graph refinement algorithms typically contain only ground facts, retaining very limited schema level knowledge even when such information is available in the source knowledge graphs. This limits the evaluation of methods that rely on rich ontological constraints, reasoning or neurosymbolic techniques and ultimately prevents assessing their performance in large-scale, real-world knowledge graphs. In this paper, we present \resource{} the first resource that provides a workflow for extracting datasets including both schema and ground facts, ready for machine learning and reasoning services, along with the resulting curated suite of datasets. The workflow also handles inconsistencies detected when keeping both schema and facts and also leverage reasoning for entailing implicit knowledge. The suite includes newly extracted datasets from KGs with expressive schemas while simultaneously enriching existing datasets with schema information. Each dataset is ...

Related Articles

Machine Learning

[P] MCGrad: fix calibration of your ML model in subgroups

Hi r/MachineLearning, We’re open-sourcing MCGrad, a Python package for multicalibration–developed and deployed in production at Meta. Thi...

Reddit - Machine Learning · 1 min ·
Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of th...

Reddit - Machine Learning · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime