DreamBlend: Advancing Personalized Fine-Tuning of Text-to-Image Diffusion Models

Given a small number of images of a subject, personalized image generation techniques can fine-tune large pretrained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a tradeoff is made between prompt fidelity, subject fidel...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE Workshop on Applications of Computer Vision pp. 3614 - 3623
Main Authors	Ram, Shwetha, Neiman, Tal, Feng, Qianli, Stuart, Andrew, Tran, Son, Chilimbi, Trishul
Format	Conference Proceeding
Language	English
Published	IEEE 26.02.2025
Subjects	Computer vision Diffusion models Image reconstruction Image synthesis Text to image
Online Access	Get full text
ISSN	2642-9381
DOI	10.1109/WACV61041.2025.00356

Cover

More Information
Summary:	Given a small number of images of a subject, personalized image generation techniques can fine-tune large pretrained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a tradeoff is made between prompt fidelity, subject fidelity and diversity. As the pretrained model is fine-tuned, earlier checkpoints synthesize images with low subject fidelity but high prompt fidelity and diversity. In contrast, later checkpoints generate images with low prompt fidelity and diversity but high subject fidelity. This inherent tradeoff limits the prompt fidelity, subject fidelity and diversity of generated images. In this work, we propose DreamBlend to combine the prompt fidelity from earlier checkpoints and the subject fidelity from later checkpoints during inference. We perform a cross attention guided image synthesis from a later checkpoint, guided by an image
ISSN:	2642-9381
DOI:	10.1109/WACV61041.2025.00356