[2602.21597] NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training
Summary
The paper presents NGDB-Zoo, a framework designed to enhance the training efficiency of Neural Graph Databases (NGDBs) by decoupling logical operators from query topologies, enabling improved throughput and GPU utilization.
Why It Matters
As the demand for efficient data processing grows, NGDB-Zoo addresses key limitations in current neural graph database training methods. By improving throughput and reducing representation friction, this framework has significant implications for applications in AI and machine learning, particularly in complex reasoning tasks.
Key Takeaways
- NGDB-Zoo improves training efficiency for Neural Graph Databases.
- The framework enables multi-stream parallelism, enhancing throughput by 1.8x to 6.8x.
- Decoupling logical operators from query topologies mitigates representation friction.
- High GPU utilization is maintained across diverse logical patterns.
- The framework integrates high-dimensional semantic priors without causing I/O stalls.
Computer Science > Machine Learning arXiv:2602.21597 (cs) [Submitted on 25 Feb 2026] Title:NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training Authors:Zhongwei Xie, Jiaxin Bai, Shujie Liu, Haoyu Huang, Yufei Li, Yisen Gao, Hong Ting Tsang, Yangqiu Song View a PDF of the paper titled NGDB-Zoo: Towards Efficient and Scalable Neural Graph Databases Training, by Zhongwei Xie and 7 other authors View PDF HTML (experimental) Abstract:Neural Graph Databases (NGDBs) facilitate complex logical reasoning over incomplete knowledge structures, yet their training efficiency and expressivity are constrained by rigid query-level batching and structure-exclusive embeddings. We present NGDB-Zoo, a unified framework that resolves these bottlenecks by synergizing operator-level training with semantic augmentation. By decoupling logical operators from query topologies, NGDB-Zoo transforms the training loop into a dynamically scheduled data-flow execution, enabling multi-stream parallelism and achieving a $1.8\times$ - $6.8\times$ throughput compared to baselines. Furthermore, we formalize a decoupled architecture to integrate high-dimensional semantic priors from Pre-trained Text Encoders (PTEs) without triggering I/O stalls or memory overflows. Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representatio...