Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering

Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the genera...

Full description

Saved in:
Bibliographic Details
Published in2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN) pp. 887 - 894
Main Authors Mallikharjuna Rao, K, Patel, Tanu
Format Conference Proceeding
LanguageEnglish
Published IEEE 03.07.2024
Subjects
Online AccessGet full text
DOI10.1109/ICIPCN63822.2024.00153

Cover

Abstract Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area.
AbstractList Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area.
Author Mallikharjuna Rao, K
Patel, Tanu
Author_xml – sequence: 1
  givenname: K
  surname: Mallikharjuna Rao
  fullname: Mallikharjuna Rao, K
  email: rao.mkrao@gmail.com
  organization: International Institute of Information Technology,LMCSI Data Science and Artificial Intelligence,Naya Raipur
– sequence: 2
  givenname: Tanu
  surname: Patel
  fullname: Patel, Tanu
  email: tanu21102@iiitnr.edu.in
  organization: International Institute of Information Technology,LMCSI Data Science and Artificial Intelligence,Naya Raipur
BookMark eNotjlFLhEAUhSeoh9r2H0TMH9Dmzuyo8xjmtsJSC_ncctU7OqCjqAv17zPq6cB3OIfvjl37wRNjjyBCAGGe8jQ_pW-RSqQMpZC7UAjQ6optTWwSpYWKYojhln1mvkVfOd_wdPDLNHTcef6xYNkRf3HWXmY3eF6003BpWp59YT92FJQ4U833zlNQXPzvGn3NT9PQjwvPfLMWNK34nt1Y7Gba_ueGFfusSA_B8f01T5-PgTOwBIlOapBaIxqthDVaImkCkLtVFNCacnVVMZQgVx4RYk1GmwpjK1FUSm3Yw9-tI6LzOLkep-8ziCgSSazUD0XPUjA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICIPCN63822.2024.00153
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350367171
EndPage 894
ExternalDocumentID 10660873
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i91t-858d1255aa9530f952ae5e11245031af9b171371b12e5e6eaade959ca7f2a0c33
IEDL.DBID RIE
IngestDate Wed Sep 18 05:50:16 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i91t-858d1255aa9530f952ae5e11245031af9b171371b12e5e6eaade959ca7f2a0c33
PageCount 8
ParticipantIDs ieee_primary_10660873
PublicationCentury 2000
PublicationDate 2024-July-3
PublicationDateYYYYMMDD 2024-07-03
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-3
  day: 03
PublicationDecade 2020
PublicationTitle 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN)
PublicationTitleAbbrev ICIPCN
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8799759
Snippet Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for...
SourceID ieee
SourceType Publisher
StartPage 887
SubjectTerms Artistic Rendering
Conditional Image Generation
dreambooth
Image Customization
Image quality
Image synthesis
Measurement
Personalized Image Generation
Prompt Engineering
Roads
Scalability
Stable Diffusion
Subject Recontextualization
Subject-Specific Image Generation
Text to image
Text-Guided ViewSynthesis
Text-to-image generation
Training
Title Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering
URI https://ieeexplore.ieee.org/document/10660873
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl4zm7ZJm3Pd2ATHDhV2cqRpokPtxmhB_Ot9L910CIK3kkNakjzel9f3fR8hN8JYnaRKs0JawWJpHMScixj6HXEviO69CB4mcvQY38_EbENW91wYa61vPrN9fPT_8sulabBUBhEuZZAmUYd0kkS1ZK0N65cH6nacjafZBA5UiAyrEGWxOZoe79im-KwxPCCT7fvaZpHXflMXffP5S4rx3x90SHo_BD06_U49R2TPVsfkaVC9oHpG9UyztgGdLioKaLJ4s_Ru4VyDlTGat848dPChURmYYR4r6RDQJssbrJJQXZU4-_uqpjtyhT2SDwd5NmIb-wS2ULxmqUhLQC9CayWiwCkRaisswKtYQCBrpwoOF9SEFzyEcWm1Lq0SyujEhTowUXRCutWysqeEyhRwUgTQxGoeJw6mLKQxcLcQKKdWxmekh2szX7UCGfPtspz_MX5B9nF_fNervCTdet3YK8jtdXHt9_QLC22kqA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Ca2bRN2pzrxqZb2aHCTo40TXQ4uyEtiH-9L-mmQxC8lRzSkuTxvpe-7_sQumFKyygWkuRcMxJyZSDmTECs3xF1gujOi2CU8v5jeD9hkxVZ3XFhtNau-Ux37KP7l18sVG2vyiDCOffiKNhGOwzKiqiha614v9QTt4NkME5SOFK-5Vj5VhibWtvjDeMUlzd6-yhdv7FpF3nt1FXeUZ-_xBj__UkHqP1D0cPj7-RziLZ0eYSeuuWL1c8on3HStKDjWYkBT-Zzje9mxtT2bgxnjTcP7n5Iqw1MbCYrcA_wJslqe0-CZVnY2d-WFd4QLGyjrNfNkj5ZGSiQmaAViVlcAH5hUgoWeEYwX2qmAWCFDEJZGpFTKFEjmlMfxrmWstCCCSUj40tPBcExapWLUp8gzGNASgGAEy1pGBmYMudKQXXBrKBaEZ6itl2b6bKRyJiul-Xsj_FrtNvPRsPpcJA-nKM9u1euBza4QK3qvdaXkOmr_Mrt7xfgw6f2
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+5th+International+Conference+on+Image+Processing+and+Capsule+Networks+%28ICIPCN%29&rft.atitle=Enhancing+Control+in+Stable+Diffusion+Through+Example-based+Fine-Tuning+and+Prompt+Engineering&rft.au=Mallikharjuna+Rao%2C+K&rft.au=Patel%2C+Tanu&rft.date=2024-07-03&rft.pub=IEEE&rft.spage=887&rft.epage=894&rft_id=info:doi/10.1109%2FICIPCN63822.2024.00153&rft.externalDocID=10660873