[2511.20721] Foundry: Distilling 3D Foundation Models for the Edge
About this article
Abstract page for arXiv paper 2511.20721: Foundry: Distilling 3D Foundation Models for the Edge
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.20721 (cs) [Submitted on 25 Nov 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:Foundry: Distilling 3D Foundation Models for the Edge Authors:Guillaume Letellier, Siddharth Srivastava (IIT Delhi), Frédéric Jurie, Gaurav Sharma (IIT Kanpur) View a PDF of the paper titled Foundry: Distilling 3D Foundation Models for the Edge, by Guillaume Letellier and 3 other authors View PDF Abstract:Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this paper, we introduce Foundation Model Distillation (FMD), a new paradigm for compressing large SSL models into compact, efficient, and faithful proxies that retain their general-purpose representational power. We present Foundry, the first implementation of FMD for 3D point clouds. Our approach, Foundry, trains a student to learn a compressed set of SuperTokens that reconstruct the teacher's token-level representations, capturing a compact basis of its latent space. A single distilled model maintains strong transferabilit...