Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering

Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the genera...

Full description

Saved in:

Bibliographic Details
Published in	2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN) pp. 887 - 894
Main Authors	Mallikharjuna Rao, K, Patel, Tanu
Format	Conference Proceeding
Language	English
Published	IEEE 03.07.2024
Subjects	Artistic Rendering Conditional Image Generation dreambooth Image Customization Image quality Image synthesis Measurement Personalized Image Generation Prompt Engineering Roads Scalability Stable Diffusion Subject Recontextualization Subject-Specific Image Generation Text to image Text-Guided ViewSynthesis Text-to-image generation Training
Online Access	Get full text
DOI	10.1109/ICIPCN63822.2024.00153

Cover

Abstract	Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area.
AbstractList	Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area.
Author	Mallikharjuna Rao, K Patel, Tanu
Author_xml	– sequence: 1 givenname: K surname: Mallikharjuna Rao fullname: Mallikharjuna Rao, K email: rao.mkrao@gmail.com organization: International Institute of Information Technology,LMCSI Data Science and Artificial Intelligence,Naya Raipur – sequence: 2 givenname: Tanu surname: Patel fullname: Patel, Tanu email: tanu21102@iiitnr.edu.in organization: International Institute of Information Technology,LMCSI Data Science and Artificial Intelligence,Naya Raipur
BookMark	eNotjlFLhEAUhSeoh9r2H0TMH9Dmzuyo8xjmtsJSC_ncctU7OqCjqAv17zPq6cB3OIfvjl37wRNjjyBCAGGe8jQ_pW-RSqQMpZC7UAjQ6optTWwSpYWKYojhln1mvkVfOd_wdPDLNHTcef6xYNkRf3HWXmY3eF6003BpWp59YT92FJQ4U833zlNQXPzvGn3NT9PQjwvPfLMWNK34nt1Y7Gba_ueGFfusSA_B8f01T5-PgTOwBIlOapBaIxqthDVaImkCkLtVFNCacnVVMZQgVx4RYk1GmwpjK1FUSm3Yw9-tI6LzOLkep-8ziCgSSazUD0XPUjA
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ICIPCN63822.2024.00153
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350367171
EndPage	894
ExternalDocumentID	10660873
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i91t-858d1255aa9530f952ae5e11245031af9b171371b12e5e6eaade959ca7f2a0c33
IEDL.DBID	RIE
IngestDate	Wed Sep 18 05:50:16 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i91t-858d1255aa9530f952ae5e11245031af9b171371b12e5e6eaade959ca7f2a0c33
PageCount	8
ParticipantIDs	ieee_primary_10660873
PublicationCentury	2000
PublicationDate	2024-July-3
PublicationDateYYYYMMDD	2024-07-03
PublicationDate_xml	– month: 07 year: 2024 text: 2024-July-3 day: 03
PublicationDecade	2020
PublicationTitle	2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN)
PublicationTitleAbbrev	ICIPCN
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8799759
Snippet	Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for...
SourceID	ieee
SourceType	Publisher
StartPage	887
SubjectTerms	Artistic Rendering Conditional Image Generation dreambooth Image Customization Image quality Image synthesis Measurement Personalized Image Generation Prompt Engineering Roads Scalability Stable Diffusion Subject Recontextualization Subject-Specific Image Generation Text to image Text-Guided ViewSynthesis Text-to-image generation Training
Title	Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering
URI	https://ieeexplore.ieee.org/document/10660873
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl4zm7ZJm3Pd2ATHDhV2cqRpokPtxmhB_Ot9L910CIK3kkNakjzel9f3fR8hN8JYnaRKs0JawWJpHMScixj6HXEviO69CB4mcvQY38_EbENW91wYa61vPrN9fPT_8sulabBUBhEuZZAmUYd0kkS1ZK0N65cH6nacjafZBA5UiAyrEGWxOZoe79im-KwxPCCT7fvaZpHXflMXffP5S4rx3x90SHo_BD06_U49R2TPVsfkaVC9oHpG9UyztgGdLioKaLJ4s_Ru4VyDlTGat848dPChURmYYR4r6RDQJssbrJJQXZU4-_uqpjtyhT2SDwd5NmIb-wS2ULxmqUhLQC9CayWiwCkRaisswKtYQCBrpwoOF9SEFzyEcWm1Lq0SyujEhTowUXRCutWysqeEyhRwUgTQxGoeJw6mLKQxcLcQKKdWxmekh2szX7UCGfPtspz_MX5B9nF_fNervCTdet3YK8jtdXHt9_QLC22kqA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Ca2bRN2pzrxqZb2aHCTo40TXQ4uyEtiH-9L-mmQxC8lRzSkuTxvpe-7_sQumFKyygWkuRcMxJyZSDmTECs3xF1gujOi2CU8v5jeD9hkxVZ3XFhtNau-Ux37KP7l18sVG2vyiDCOffiKNhGOwzKiqiha614v9QTt4NkME5SOFK-5Vj5VhibWtvjDeMUlzd6-yhdv7FpF3nt1FXeUZ-_xBj__UkHqP1D0cPj7-RziLZ0eYSeuuWL1c8on3HStKDjWYkBT-Zzje9mxtT2bgxnjTcP7n5Iqw1MbCYrcA_wJslqe0-CZVnY2d-WFd4QLGyjrNfNkj5ZGSiQmaAViVlcAH5hUgoWeEYwX2qmAWCFDEJZGpFTKFEjmlMfxrmWstCCCSUj40tPBcExapWLUp8gzGNASgGAEy1pGBmYMudKQXXBrKBaEZ6itl2b6bKRyJiul-Xsj_FrtNvPRsPpcJA-nKM9u1euBza4QK3qvdaXkOmr_Mrt7xfgw6f2
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+5th+International+Conference+on+Image+Processing+and+Capsule+Networks+%28ICIPCN%29&rft.atitle=Enhancing+Control+in+Stable+Diffusion+Through+Example-based+Fine-Tuning+and+Prompt+Engineering&rft.au=Mallikharjuna+Rao%2C+K&rft.au=Patel%2C+Tanu&rft.date=2024-07-03&rft.pub=IEEE&rft.spage=887&rft.epage=894&rft_id=info:doi/10.1109%2FICIPCN63822.2024.00153&rft.externalDocID=10660873