Controlling BigGAN Image Generation with a Segmentation Network

GANS have been used for a variety of unconditional and conditional generation tasks; while class-conditional generation can be directly integrated into the training process, integrating more sophisticated conditioning signals within the training is not as straightforward. In this work, we consider t...

Full description

Saved in:
Bibliographic Details
Published inDiscovery Science Vol. 12986; pp. 268 - 281
Main Authors Jaiswal, Aman, Sodhi, Harpreet Singh, Muzamil H, Mohamed, Chandhok, Rajveen Singh, Oore, Sageev, Sastry, Chandramouli Shama
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030889418
3030889416
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-88942-5_21

Cover

More Information
Summary:GANS have been used for a variety of unconditional and conditional generation tasks; while class-conditional generation can be directly integrated into the training process, integrating more sophisticated conditioning signals within the training is not as straightforward. In this work, we consider the task of sampling from P(X) such that the silhouette of (the subject of) X matches the silhouette of (the subject of) a given image; that is, we not only specify what to generate, but we also control where to put it: more generally, we allow a mask (this is actually another image) to control the silhouette of the object to be generated. The mask is itself the result of a segmentation system applied to a user-provided image. To achieve this, we use pre-trained BigGAN and State-of-the-art segmentation models (e.g. DeepLabV3 and FCN) as follows: we first sample a random latent vector z from the Gaussian Prior of BigGAN and then iteratively modify the latent vector until the silhouettes of X=G(z) $$X=G(z)$$ and the reference image match. While the BigGAN is a class-conditional generative model trained on the 1000 classes of ImageNet, the segmentation models are trained on the 20 classes of the PASCAL VOC dataset; we choose the “Dog” and the “Cat” classes to demonstrate our controlled generation model.
Bibliography:Original Abstract: GANS have been used for a variety of unconditional and conditional generation tasks; while class-conditional generation can be directly integrated into the training process, integrating more sophisticated conditioning signals within the training is not as straightforward. In this work, we consider the task of sampling from P(X) such that the silhouette of (the subject of) X matches the silhouette of (the subject of) a given image; that is, we not only specify what to generate, but we also control where to put it: more generally, we allow a mask (this is actually another image) to control the silhouette of the object to be generated. The mask is itself the result of a segmentation system applied to a user-provided image. To achieve this, we use pre-trained BigGAN and State-of-the-art segmentation models (e.g. DeepLabV3 and FCN) as follows: we first sample a random latent vector z from the Gaussian Prior of BigGAN and then iteratively modify the latent vector until the silhouettes of X=G(z)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X=G(z)$$\end{document} and the reference image match. While the BigGAN is a class-conditional generative model trained on the 1000 classes of ImageNet, the segmentation models are trained on the 20 classes of the PASCAL VOC dataset; we choose the “Dog” and the “Cat” classes to demonstrate our controlled generation model.
ISBN:9783030889418
3030889416
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-88942-5_21