[2505.22914] cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning

[2505.22914] cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning

arXiv - Machine Learning 4 min read Article

Summary

The paper presents 'cadrille', a multi-modal CAD reconstruction model utilizing reinforcement learning to process diverse input data, achieving state-of-the-art results in CAD tasks.

Why It Matters

This research addresses the limitations of existing CAD reconstruction methods that rely on single input modalities. By integrating multiple data types, it enhances the robustness and accessibility of CAD applications, potentially transforming engineering and manufacturing processes.

Key Takeaways

  • cadrille processes point clouds, images, and text simultaneously for CAD reconstruction.
  • The model employs a two-stage training approach: supervised fine-tuning followed by reinforcement learning.
  • cadrille sets new benchmarks in CAD tasks, outperforming single-modal methods.
  • The research introduces Group Relative Preference Optimization for RL fine-tuning in CAD.
  • Code for the model is publicly available, promoting further research and development.

Computer Science > Computer Vision and Pattern Recognition arXiv:2505.22914 (cs) [Submitted on 28 May 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning Authors:Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich View a PDF of the paper titled cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning, by Maksim Kolodiazhnyi and 8 other authors View PDF HTML (experimental) Abstract:Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fin...

Related Articles

Machine Learning

I tried building a memory-first AI… and ended up discovering smaller models can beat larger ones

Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Run Karpathy's Autoresearch for $0.44 instead of $24 — Open-source parallel evolution pipeline on SageMaker Spot

TL;DR: I built an open-source pipeline that runs Karpathy's autoresearch on SageMaker Spot instances — 25 autonomous ML experiments for $...

Reddit - Machine Learning · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime