Learning Unified Decompositional and Compositional NeRF for
Editable Novel View Synthesis
ICCV 2023
- Yuxin Wang HKUST
- Wayne Wu Shanghai AI Lab
- Dan Xu HKUST
An overview of the proposed unified decompositional and compositional NeRF (UDC-NeRF) framework for joint novel view synthesis and scene editing. It has two stages. In the first stage (the coarse stage), it learns a guidance radiance field for guiding point sampling. In the second stage (the fine stage), we learn scene decomposition via learnable object codes and two novel decomposition schemes: (i) the 3D one-shot object radiance activation regularization and (ii) color inpaiting handling ambiguous generation in occluded background areas. The scene composition is achieved by using one-hot activation weights for different object-level radiance fields learned in the decomposition stage. The decomposition allows scene editing and the composition enables novel view synthesis in the unified framework.
Samples - Object Manipulation Videos
Comparisons with the state-of-the-art method ObjectNeRF.
Samples - Novel View Synthesis
Comparisons with state-of-the-art methods: ObjectNeRF and ObjectSDF.
Samples - Background/Object Decomposition
Comparisons with ObjectNeRF and ground truth on ToyDesk-scene2 and ScanNet-0113.
Samples - Editing
Editing results on both ToyDesk and ScanNet datasets.
Abstract
Implicit neural representations have shown powerful capacity in modeling real-world 3D scenes, offering superior performance in novel view synthesis. In this paper, we target a more challenging scenario, i.e., joint scene novel view synthesis and editing based on implicit neural scene representations. State-of-the-art methods in this direction typically consider building separate networks for these two tasks (i.e., view synthesis and editing). Thus, the modeling of interactions and correlations between these two tasks is very limited, which, however, is critical for learning high-quality scene representations. To tackle this problem, in this paper, we propose a unified Neural Radiance Field (NeRF) framework to effectively perform joint scene decomposition and composition for modeling real-world scenes. The decomposition aims at learning disentangled 3D representations of different objects and the background, allowing for scene editing, while scene composition models an entire scene representation for novel view synthesis. Specifically, with a two-stage NeRF framework, we learn a coarse stage for predicting a global radiance field as guidance for point sampling, and in the second fine-grained stage, we perform scene decomposition by a novel one-hot object radiance field regularization module and a pseudo supervision via inpainting to handle ambiguous background regions occluded by objects. The decomposed object-level radiance fields are further composed by using activations from the decomposition module. Extensive quantitative and qualitative results show the effectiveness of our method for scene decomposition and composition, outperforming state-of-the-art methods for both novel-view synthesis and editing tasks.
Citation
@inproceedings{wang2023udcnerf, title={Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis}, author={Wang, Yuxin and Wu, Wayne and Xu, Dan}, booktitle={ICCV}, year={2023} }
Acknowledgements
The website template was borrowed from Jon Barron Mip-NeRF.