[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation
Summary
The paper presents UP-Fuse, an innovative framework for LiDAR-camera fusion that enhances 3D panoptic segmentation by addressing sensor degradation and failure through uncertainty-guided methods.
Why It Matters
As autonomous systems increasingly rely on sensor fusion for accurate perception, UP-Fuse offers a robust solution to maintain performance under adverse conditions, which is crucial for safety in robotics and autonomous vehicles. This research could significantly improve the reliability of perception systems in critical applications.
Key Takeaways
- UP-Fuse utilizes an uncertainty-aware fusion framework to enhance 3D segmentation.
- The method remains effective even when camera sensors degrade or fail.
- It employs a hybrid 2D-3D transformer to mitigate spatial ambiguities.
- Extensive testing on multiple benchmarks demonstrates UP-Fuse's robustness.
- This approach is particularly beneficial for robotic perception in safety-critical environments.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.19349 (cs) [Submitted on 22 Feb 2026] Title:UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation Authors:Rohit Mohan, Florian Drews, Yakov Miron, Daniele Cattaneo, Abhinav Valada View a PDF of the paper titled UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation, by Rohit Mohan and 4 other authors View PDF HTML (experimental) Abstract:LiDAR-camera fusion enhances 3D panoptic segmentation by leveraging camera images to complement sparse LiDAR scans, but it also introduces a critical failure mode. Under adverse conditions, degradation or failure of the camera sensor can significantly compromise the reliability of the perception system. To address this problem, we introduce UP-Fuse, a novel uncertainty-aware fusion framework in the 2D range-view that remains robust under camera sensor degradation, calibration drift, and sensor failure. Raw LiDAR data is first projected into the range-view and encoded by a LiDAR encoder, while camera features are simultaneously extracted and projected into the same shared space. At its core, UP-Fuse employs an uncertainty-guided fusion module that dynamically modulates cross-modal interaction using predicted uncertainty maps. These maps are learned by quantifying representational divergence under diverse visual degradations, ensuring that only reliable visual cues influence the fused representation. The fused range-view feature...