Reviews 3D reconstruction, including self-supervised, SLAM, and NeRF methods. Our approach uses open-set 2D instance segmentation and RGB-D back-projection for efficient instance-based 3D mapping.Reviews 3D reconstruction, including self-supervised, SLAM, and NeRF methods. Our approach uses open-set 2D instance segmentation and RGB-D back-projection for efficient instance-based 3D mapping.

Semantic Geometry Completion and SLAM Integration in 3D Mapping

2025/12/11 02:00

Abstract and 1 Introduction

  1. Related Works

    2.1. Vision-and-Language Navigation

    2.2. Semantic Scene Understanding and Instance Segmentation

    2.3. 3D Scene Reconstruction

  2. Methodology

    3.1. Data Collection

    3.2. Open-set Semantic Information from Images

    3.3. Creating the Open-set 3D Representation

    3.4. Language-Guided Navigation

  3. Experiments

    4.1. Quantitative Evaluation

    4.2. Qualitative Results

  4. Conclusion and Future Work, Disclosure statement, and References

2.3. 3D Scene Reconstruction

In recent times, 3D scene reconstruction has seen significant advancements. Some recent works in this field include using a self-supervised approach for Semantic Geometry completion and appearance reconstruction from RGB-D scans such as [26], which uses 3D encoder-decoder architecture for geometry and colour. For these approaches, the focus is on generating semantic reconstruction without ground truth. Another approach is to integrate real-time 3D reconstruction with SLAM. This is done through keyframe-based techniques and has been used in recent autonomous navigation and AR use cases[27]. Another recent method has seen work on Neural Radiance Fields[28] for indoor spaces when utilizing structure-from-motion to understand camera-captured scenes. These NeRF models are trained for each location and are particularly good for spatial understanding. Another method is to build 3D scene graphs using open vocabulary and foundational models like CLIP to capture semantic relationships between objects and their visual representations[4]. During reconstruction, they use the features extracted from the 3D point clouds and project them onto the embedding space learned by CLIP.

\ This work uses an open-set 2D instance segmentation method, as explained in the previous sections. Given an RGB-D image, we get these individual object masks from the RGB image and back-project them to 3D using the Depth image. Here, we have an instance-based approach instead of having a point-by-point computation to reconstruct, which was previously done by Concept-Fusion [29]. This per-object feature mask extraction also helps us compute embeddings, which preserve the open-set nature of this pipeline.

\

:::info Authors:

(1) Laksh Nanwani, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(2) Kumaraditya Gupta, International Institute of Information Technology, Hyderabad, India;

(3) Aditya Mathur, International Institute of Information Technology, Hyderabad, India; this author contributed equally to this work;

(4) Swayam Agrawal, International Institute of Information Technology, Hyderabad, India;

(5) A.H. Abdul Hafez, Hasan Kalyoncu University, Sahinbey, Gaziantep, Turkey;

(6) K. Madhava Krishna, International Institute of Information Technology, Hyderabad, India.

:::


:::info This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.