[2504.08603] FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
Summary
The paper presents FindAnything, a framework for open-vocabulary and object-centric mapping that enhances robot exploration in unknown environments by integrating vision-language features for improved semantic understanding.
Why It Matters
FindAnything addresses significant challenges in robotic mapping by enabling real-time semantic understanding in large-scale environments. This advancement is crucial for applications like autonomous exploration and search and rescue missions, making it relevant for both academic research and practical implementations in robotics.
Key Takeaways
- FindAnything combines geometric and semantic information for enhanced mapping.
- The framework is efficient in memory usage, making it suitable for resource-constrained devices.
- It demonstrates real-time capabilities, beneficial for tasks like autonomous exploration.
- FindAnything achieves state-of-the-art semantic accuracy while being faster than existing solutions.
- The integration of vision-language features allows for open-vocabulary queries in 3D mapping.
Computer Science > Robotics arXiv:2504.08603 (cs) [Submitted on 11 Apr 2025 (v1), last revised 18 Feb 2026 (this version, v3)] Title:FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment Authors:Sebastián Barbas Laina, Simon Boche, Sotiris Papatheodorou, Simon Schaefer, Jaehyung Jung, Stefan Leutenegger View a PDF of the paper titled FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment, by Sebasti\'an Barbas Laina and 5 other authors View PDF HTML (experimental) Abstract:Geometrically accurate and semantically expressive map representations have proven invaluable for robot deployment and task planning in unknown environments. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments still presents open challenges, mainly due to computational requirements. In this paper we present FindAnything, an open-world mapping framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything combines pure geometric and open-vocabulary semantic information for a higher level of understanding. It proposes an efficient storage of open-vocabulary information through the aggregation of features at the object level. Pixelwise vision-language features are aggregated based on eSAM segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from o...