Rethinking Open-Vocabulary Segmentation
of Radiance Fields in 3D Space

Hyunjee Lee*, Youngsik Yun*, Jeongmin Bae, Seoha Kim, Youngjung Uh


* equal contribution

Yonsei University

Previous works segment 2D masks on rendered images to understand radiance fields. Instead, we reformulate the task to segment 3D volumes. Our approach greatly improves 3D and 2D understanding of radiance fields.

Abstract

Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together.


Method Overview

We propose 3D segmentation as a more practical problem setting, segmenting the 3D volume for a given text query. Then we propose point-wise semantic loss to supervise the sampled point embeddings. Furthermore, the learned language fields can be transferred into 3DGS for faster rendering speeds. Lastly, our 3D evaluation protocol measures the 3D segmentation performance both in reconstructed geometry and semantics.


Quantitative Results

We compare quantitative results of 3D and 2D segmentation in Replica dataset, LERF dataset, and 3D-OVS. Red and orange highlights indicate the 1st, and 2nd-best model. We achieve state-of-the-art segmentation accuracy in both 3D and 2D.



3D segmentation Results

Qualitative comparisons of 3D segmentation on the LERF and Replica datasets. We show an exported mesh of 3D querying results for the given text query. Unlike competitors, our method produces more clear boundaries in 3D segmentation results.




2D Segmentation Results

Qualitative Comparisons of 2D Segmentation on LERF and 3D-OVS Dataset. We show a heatmap of the similarity for the given text query. We dim the background except for the target object, for better visualization. Our method achieves accurate segmentation results in 2D compared to competitors.


Real-time Rendering with Ours-3DGS

Our transferred 3DGS from the pre-trained language field achieves the first real-time rendering speed of segmentation,
which is 28x faster than the previous fastest method.

Citation

@misc{lee2024open3drf,
  title={Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space}, 
  author={Hyunjee Lee and Youngsik Yun and Jeongmin Bae and Seoha Kim and Youngjung Uh},
  year={2024},
  eprint={2408.07416},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2408.07416}, 
}