Thesaurus와 TTA를 이용한 Stable Diffusion 사용자 프롬프트의 의미론적 확장 및 생성 방법

Text-to-image 생성 모델에서 사용자 프롬프트는 결과물인 이미지의 품질을 결정하는 핵심 요소이다. 하지만 이미지 생성 모델의 현행 연구들은 대부분 이미지를 생성하는 것에만 중점을 두고 있어, 사용자들은 원하는 결과를 얻기 위해 적절한 어휘로 프롬프트를 작성하는 데 어려움을 느끼는 문제가 있다. 본 연구에서는 생성 모델의 최종 출력과 사용자의 의도 사이의 간극을 줄여 모델의 사용성을 높이는 새로운 방법론을 제안한다. Thesaurus 기반 TTA(Test Time Augmentation) 기법을 도입하여 사용자의 프롬프트를...

Full description

Saved in:

Bibliographic Details
Published in	정보과학회 컴퓨팅의 실제 논문지 Vol. 30; no. 10; pp. 513 - 518
Main Authors	이정(Jung Lee), 최영(Young Choi), 송진하(Jinha Song), 낭종호(Jongho Nang)
Format	Journal Article
Language	Korean
Published	Korean Institute of Information Scientists and Engineers 01.10.2024 한국정보과학회
Subjects	컴퓨터학 prompt engineering test time augmentation deep learning 프롬프트 엔지니어링 multi modal 스테이블 디퓨전 딥러닝 테스트 시간 증강 generative model 생성모델 Stable Diffusion 멀티모달
Online Access	Get full text
ISSN	2383-6318 2383-6326
DOI	10.5626/KTCP.2024.30.10.513

Cover

More Information
Summary:	Text-to-image 생성 모델에서 사용자 프롬프트는 결과물인 이미지의 품질을 결정하는 핵심 요소이다. 하지만 이미지 생성 모델의 현행 연구들은 대부분 이미지를 생성하는 것에만 중점을 두고 있어, 사용자들은 원하는 결과를 얻기 위해 적절한 어휘로 프롬프트를 작성하는 데 어려움을 느끼는 문제가 있다. 본 연구에서는 생성 모델의 최종 출력과 사용자의 의도 사이의 간극을 줄여 모델의 사용성을 높이는 새로운 방법론을 제안한다. Thesaurus 기반 TTA(Test Time Augmentation) 기법을 도입하여 사용자의 프롬프트를 의미론적으로 연관된 다양한 augmented prompt로 확장한 뒤, 사용자의 피드백을 반영한다. 본 연구의 방법을 통해 사용자의 프롬프트로 생성한 이미지와는 다른 다양한 이미지를 생성하는 것을 정성 평가를 통해 확인하였으며, 증강된 프롬프트가 사용자의 프롬프트와 의미론적으로 연관되어 있다는 것을 BERT Score를 이용한 정량 평가를 통해 확인하였다. In text-to-image generation models, the user prompt plays a crucial role in determining the quality of the resulting image. However, current research on image generation models primarily focuses on the actual creation of images, leaving users struggling to come up with prompts that use appropriate vocabulary to achieve their desired outcomes. This paper presents a new methodology that aims to enhance the usability of generative models by bridging the gap between the model's final output and the user's intention. To accomplish this, we introduce a Thesaurus-based Test Time Augmentation(TTA) technique, which allows us to semantically expand user prompts into a variety of related augmented prompts. We then incorporate user feedback into the process. We validated the effectiveness of our approach through qualitative evaluations, observing the generation of diverse images from a single user prompt. Furthermore, we confirmed the semantic relevance of our augmented prompts to the user's original prompt using a quantitative evaluation with BERT Scores. KCI Citation Count: 0
ISSN:	2383-6318 2383-6326
DOI:	10.5626/KTCP.2024.30.10.513