[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Summary
The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-symbolic alignment and improves model performance on geometry tasks.
Why It Matters
This research addresses the limitations of current vision-language models in handling complex geometric tasks due to insufficient training data. By creating the GeoCode dataset, the authors provide a valuable resource that enhances the capabilities of AI models in understanding and reasoning about geometry, which is crucial for advancements in computer vision and AI applications.
Key Takeaways
- GeoCode dataset improves visual-symbolic alignment in geometry tasks.
- The proposed pipeline decouples problem generation for better consistency.
- Models trained on GeoCode show significant performance improvements on geometry benchmarks.
- The dataset ensures mathematical correctness through multi-stage validation.
- Code prediction is introduced as a new alignment objective for structured prediction.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18745 (cs) [Submitted on 21 Feb 2026] Title:Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code Authors:Haobo Lin, Tianyi Bai, Chen Chen, Jiajun Zhang, Bohan Zeng, Wentao Zhang, Binhang Yuan View a PDF of the paper titled Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code, by Haobo Lin and 6 other authors View PDF Abstract:Multimodal geometry reasoning requires models to jointly understand visual diagrams and perform structured symbolic inference, yet current vision--language models struggle with complex geometric constructions due to limited training data and weak visual--symbolic alignment. We propose a pipeline for synthesizing complex multimodal geometry problems from scratch and construct a dataset named \textbf{GeoCode}, which decouples problem generation into symbolic seed construction, grounded instantiation with verification, and code-based diagram rendering, ensuring consistency across structure, text, reasoning, and images. Leveraging the plotting code provided in GeoCode, we further introduce code prediction as an explicit alignment objective, transforming visual understanding into a supervised structured prediction task. GeoCode exhibits substantially higher structural complexity and reasoning difficulty than existing benchmarks, while maintaining mathematical correctness through multi-stag...