Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering
Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the genera...
Saved in:
Published in | 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN) pp. 887 - 894 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
03.07.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICIPCN63822.2024.00153 |
Cover
Abstract | Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area. |
---|---|
AbstractList | Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for specific subjects remains challenging. Existing techniques like dreambooth address this somewhat, but they lack fine-grained control over the generated image. This work proposes a novel approach that combines dreambooth fine-tuning with prompt engineering for controllable, subject-specific image generation with Stable Diffusion. We leverage dreambooth to embed a unique identifier for a subject. By incorporating carefully crafted text prompts alongside dreambooth, users can guide the image generation process toward specific details like pose, environment, and lighting. This allows for highly customized image generation featuring the subject in diverse contexts, even if those elements weren't present in the initial reference images used for dreambooth training. We leverage an autogenous class-specific prior preservation loss function to ensure the subject's key characteristics are retained throughout the generation process. We demonstrate the effectiveness of our method on various tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering. CLIP Score has been used as the evaluation metrics, this work establishes a new benchmark dataset specifically designed for subject-driven image generation using Stable Diffusion and prompt engineering, facilitating further research in this area. |
Author | Mallikharjuna Rao, K Patel, Tanu |
Author_xml | – sequence: 1 givenname: K surname: Mallikharjuna Rao fullname: Mallikharjuna Rao, K email: rao.mkrao@gmail.com organization: International Institute of Information Technology,LMCSI Data Science and Artificial Intelligence,Naya Raipur – sequence: 2 givenname: Tanu surname: Patel fullname: Patel, Tanu email: tanu21102@iiitnr.edu.in organization: International Institute of Information Technology,LMCSI Data Science and Artificial Intelligence,Naya Raipur |
BookMark | eNotjlFLhEAUhSeoh9r2H0TMH9Dmzuyo8xjmtsJSC_ncctU7OqCjqAv17zPq6cB3OIfvjl37wRNjjyBCAGGe8jQ_pW-RSqQMpZC7UAjQ6optTWwSpYWKYojhln1mvkVfOd_wdPDLNHTcef6xYNkRf3HWXmY3eF6003BpWp59YT92FJQ4U833zlNQXPzvGn3NT9PQjwvPfLMWNK34nt1Y7Gba_ueGFfusSA_B8f01T5-PgTOwBIlOapBaIxqthDVaImkCkLtVFNCacnVVMZQgVx4RYk1GmwpjK1FUSm3Yw9-tI6LzOLkep-8ziCgSSazUD0XPUjA |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICIPCN63822.2024.00153 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350367171 |
EndPage | 894 |
ExternalDocumentID | 10660873 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i91t-858d1255aa9530f952ae5e11245031af9b171371b12e5e6eaade959ca7f2a0c33 |
IEDL.DBID | RIE |
IngestDate | Wed Sep 18 05:50:16 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i91t-858d1255aa9530f952ae5e11245031af9b171371b12e5e6eaade959ca7f2a0c33 |
PageCount | 8 |
ParticipantIDs | ieee_primary_10660873 |
PublicationCentury | 2000 |
PublicationDate | 2024-July-3 |
PublicationDateYYYYMMDD | 2024-07-03 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-July-3 day: 03 |
PublicationDecade | 2020 |
PublicationTitle | 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN) |
PublicationTitleAbbrev | ICIPCN |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8799759 |
Snippet | Recent advancements in text-to-image generation allow the creation of diverse images from textual descriptions. However, personalizing these models for... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 887 |
SubjectTerms | Artistic Rendering Conditional Image Generation dreambooth Image Customization Image quality Image synthesis Measurement Personalized Image Generation Prompt Engineering Roads Scalability Stable Diffusion Subject Recontextualization Subject-Specific Image Generation Text to image Text-Guided ViewSynthesis Text-to-image generation Training |
Title | Enhancing Control in Stable Diffusion Through Example-based Fine-Tuning and Prompt Engineering |
URI | https://ieeexplore.ieee.org/document/10660873 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5uJ08qTvxNDl4zm7ZJm3Pd2ATHDhV2cqRpokPtxmhB_Ot9L910CIK3kkNakjzel9f3fR8hN8JYnaRKs0JawWJpHMScixj6HXEviO69CB4mcvQY38_EbENW91wYa61vPrN9fPT_8sulabBUBhEuZZAmUYd0kkS1ZK0N65cH6nacjafZBA5UiAyrEGWxOZoe79im-KwxPCCT7fvaZpHXflMXffP5S4rx3x90SHo_BD06_U49R2TPVsfkaVC9oHpG9UyztgGdLioKaLJ4s_Ru4VyDlTGat848dPChURmYYR4r6RDQJssbrJJQXZU4-_uqpjtyhT2SDwd5NmIb-wS2ULxmqUhLQC9CayWiwCkRaisswKtYQCBrpwoOF9SEFzyEcWm1Lq0SyujEhTowUXRCutWysqeEyhRwUgTQxGoeJw6mLKQxcLcQKKdWxmekh2szX7UCGfPtspz_MX5B9nF_fNervCTdet3YK8jtdXHt9_QLC22kqA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Ca2bRN2pzrxqZb2aHCTo40TXQ4uyEtiH-9L-mmQxC8lRzSkuTxvpe-7_sQumFKyygWkuRcMxJyZSDmTECs3xF1gujOi2CU8v5jeD9hkxVZ3XFhtNau-Ux37KP7l18sVG2vyiDCOffiKNhGOwzKiqiha614v9QTt4NkME5SOFK-5Vj5VhibWtvjDeMUlzd6-yhdv7FpF3nt1FXeUZ-_xBj__UkHqP1D0cPj7-RziLZ0eYSeuuWL1c8on3HStKDjWYkBT-Zzje9mxtT2bgxnjTcP7n5Iqw1MbCYrcA_wJslqe0-CZVnY2d-WFd4QLGyjrNfNkj5ZGSiQmaAViVlcAH5hUgoWeEYwX2qmAWCFDEJZGpFTKFEjmlMfxrmWstCCCSUj40tPBcExapWLUp8gzGNASgGAEy1pGBmYMudKQXXBrKBaEZ6itl2b6bKRyJiul-Xsj_FrtNvPRsPpcJA-nKM9u1euBza4QK3qvdaXkOmr_Mrt7xfgw6f2 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+5th+International+Conference+on+Image+Processing+and+Capsule+Networks+%28ICIPCN%29&rft.atitle=Enhancing+Control+in+Stable+Diffusion+Through+Example-based+Fine-Tuning+and+Prompt+Engineering&rft.au=Mallikharjuna+Rao%2C+K&rft.au=Patel%2C+Tanu&rft.date=2024-07-03&rft.pub=IEEE&rft.spage=887&rft.epage=894&rft_id=info:doi/10.1109%2FICIPCN63822.2024.00153&rft.externalDocID=10660873 |