[2604.02546] Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding

[2604.02546] Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.02546: Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.02546 (cs) [Submitted on 2 Apr 2026] Title:Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding Authors:Ye Mao, Weixun Luo, Ranran Huang, Junpeng Jing, Krystian Mikolajczyk View a PDF of the paper titled Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding, by Ye Mao and 4 other authors View PDF HTML (experimental) Abstract:Pretraining 3D encoders by aligning with Contrastive Language Image Pretraining (CLIP) has emerged as a promising direction to learn generalizable representations for 3D scene understanding. In this paper, we propose UniScene3D, a transformer-based encoder that learns unified scene representations from multi-view colored pointmaps, jointly modeling image appearance and geometry. For robust colored pointmap representation learning, we introduce novel cross-view geometric alignment and grounded view alignment to enforce cross-view geometry and semantic consistency. Extensive low-shot and task-specific fine-tuning evaluations on viewpoint grounding, scene retrieval, scene type classification, and 3D VQA demonstrate our state-of-the-art performance. These results highlight the effectiveness of our approach for unified 3D scene understanding. this https URL Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2604.02546 [cs.CV]   (or arXiv:2604.02546v1 [cs.CV] for this ve...

Originally published on April 06, 2026. Curated by AI News.

Related Articles

Machine Learning

Flux maintains facial geometry and spatial coherence across 5 sequential iterative edits - is anything else doing this at this level?

One woman. 5 Different Prompts. Perfect Contextual Preservation Playing around with Flux again and thought I'll try it with a model chang...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] PCA before truncation makes non-Matryoshka embeddings compressible: results on BGE-M3 [P]

Most embedding models are not Matryoshka-trained, so naive dimension truncation tends to destroy them. I tested a simple alternative: fit...

Reddit - Machine Learning · 1 min ·
Machine Learning

Looking for Feedback & Improvement Ideas[P]

Hey everyone, I recently built a machine learning project and would really appreciate some honest feedback from this community. LINK- htt...

Reddit - Machine Learning · 1 min ·
Machine Learning

Why Anthropic’s new model has cybersecurity experts rattled

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime