[2602.23172] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
Summary
The paper presents Latent Gaussian Splatting (LaGS) for 4D panoptic occupancy tracking, enhancing robot perception in dynamic environments by integrating multi-view data into a cohesive 3D representation.
Why It Matters
This research addresses the critical challenge of effective spatiotemporal scene understanding for robotics, which is essential for safe navigation and interaction in complex environments. By advancing existing methods, LaGS has the potential to improve robotic applications in various fields, including autonomous driving and robotic assistance.
Key Takeaways
- LaGS integrates camera-based tracking with multi-view occupancy prediction.
- The method efficiently aggregates multi-view information into 3D voxel grids.
- Achieves state-of-the-art performance on the Occ3D nuScenes and Waymo datasets.
- Introduces a novel latent Gaussian splatting approach for scene representation.
- Code availability promotes further research and application in the field.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.23172 (cs) [Submitted on 26 Feb 2026] Title:Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking Authors:Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo, Abhinav Valada View a PDF of the paper titled Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking, by Maximilian Luz and 5 other authors View PDF Abstract:Capturing 4D spatiotemporal surroundings is crucial for the safe and reliable operation of robots in dynamic environments. However, most existing methods address only one side of the problem: they either provide coarse geometric tracking via bounding boxes, or detailed 3D structures like voxel-based occupancy that lack explicit temporal association. In this work, we present Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking (LaGS) that advances spatiotemporal scene understanding in a holistic direction. Our approach incorporates camera-based end-to-end tracking with mask-based multi-view panoptic occupancy prediction, and addresses the key challenge of efficiently aggregating multi-view information into 3D voxel grids via a novel latent Gaussian splatting approach. Specifically, we first fuse observations into 3D Gaussians that serve as a sparse point-centric latent representation of the 3D scene, and then splat the aggregated features onto a 3D voxel grid that is decoded by a mask-based segmentation head. We evaluate LaGS on the Occ3D nuScenes an...