Controlling BigGAN Image Generation with a Segmentation Network

GANS have been used for a variety of unconditional and conditional generation tasks; while class-conditional generation can be directly integrated into the training process, integrating more sophisticated conditioning signals within the training is not as straightforward. In this work, we consider t...

Full description

Saved in:

Bibliographic Details
Published in	Discovery Science Vol. 12986; pp. 268 - 281
Main Authors	Jaiswal, Aman, Sodhi, Harpreet Singh, Muzamil H, Mohamed, Chandhok, Rajveen Singh, Oore, Sageev, Sastry, Chandramouli Shama
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2021 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Computational creativity tools Generative model Image segmentation
Online Access	Get full text
ISBN	9783030889418 3030889416
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-88942-5_21

Cover

More Information
Summary:	GANS have been used for a variety of unconditional and conditional generation tasks; while class-conditional generation can be directly integrated into the training process, integrating more sophisticated conditioning signals within the training is not as straightforward. In this work, we consider the task of sampling from P(X) such that the silhouette of (the subject of) X matches the silhouette of (the subject of) a given image; that is, we not only specify what to generate, but we also control where to put it: more generally, we allow a mask (this is actually another image) to control the silhouette of the object to be generated. The mask is itself the result of a segmentation system applied to a user-provided image. To achieve this, we use pre-trained BigGAN and State-of-the-art segmentation models (e.g. DeepLabV3 and FCN) as follows: we first sample a random latent vector z from the Gaussian Prior of BigGAN and then iteratively modify the latent vector until the silhouettes of X=G(z) $$X=G(z)$$ and the reference image match. While the BigGAN is a class-conditional generative model trained on the 1000 classes of ImageNet, the segmentation models are trained on the 20 classes of the PASCAL VOC dataset; we choose the “Dog” and the “Cat” classes to demonstrate our controlled generation model.
Bibliography:	Original Abstract: GANS have been used for a variety of unconditional and conditional generation tasks; while class-conditional generation can be directly integrated into the training process, integrating more sophisticated conditioning signals within the training is not as straightforward. In this work, we consider the task of sampling from P(X) such that the silhouette of (the subject of) X matches the silhouette of (the subject of) a given image; that is, we not only specify what to generate, but we also control where to put it: more generally, we allow a mask (this is actually another image) to control the silhouette of the object to be generated. The mask is itself the result of a segmentation system applied to a user-provided image. To achieve this, we use pre-trained BigGAN and State-of-the-art segmentation models (e.g. DeepLabV3 and FCN) as follows: we first sample a random latent vector z from the Gaussian Prior of BigGAN and then iteratively modify the latent vector until the silhouettes of X=G(z)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X=G(z)$$\end{document} and the reference image match. While the BigGAN is a class-conditional generative model trained on the 1000 classes of ImageNet, the segmentation models are trained on the 20 classes of the PASCAL VOC dataset; we choose the “Dog” and the “Cat” classes to demonstrate our controlled generation model.
ISBN:	9783030889418 3030889416
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-88942-5_21