CG-NeRF: Conditional Generative Neural Radiance Fields for 3D-aware Image Synthesis

Recent generative models based on neural radiance fields (NeRF) achieve the generation of diverse 3D-aware images. Despite the success, their applicability can be further expanded by incorporating with various types of user-specified conditions such as text and images. In this paper, we propose a no...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE Workshop on Applications of Computer Vision pp. 724 - 733
Main Authors	Jo, Kyungmin, Shim, Gyumin, Jung, Sanghun, Yang, Soyoung, Choo, Jaegul
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2023
Subjects	3D computer vision Algorithms: Computational photography Codes Computer architecture Computer vision image and video synthesis Image quality Image synthesis Measurement Shape
Online Access	Get full text
ISSN	2642-9381
DOI	10.1109/WACV56688.2023.00079

Cover

More Information
Summary:	Recent generative models based on neural radiance fields (NeRF) achieve the generation of diverse 3D-aware images. Despite the success, their applicability can be further expanded by incorporating with various types of user-specified conditions such as text and images. In this paper, we propose a novel approach called the conditional generative neural radiance fields (CG-NeRF), which generates multi-view images that reflect multimodal input conditions such as images or text. However, generating 3D-aware images from multimodal conditions bears several challenges. First, each condition type has different amount of information - e.g., the amount of information in text and color images are significantly different. Furthermore, the pose-consistency is often violated when diversifying the generated images from input conditions. Addressing such challenges, we propose 1) a unified architecture that effectively handles multiple types of conditions, and 2) the pose-consistent diversity loss for generating various images while maintaining the view consistency. Experimental results show that the proposed method maintains consistent image quality on various multimodal condition types and achieves superior fidelity and diversity compared to the existing NeRF-based generative models.
ISSN:	2642-9381
DOI:	10.1109/WACV56688.2023.00079