A Generation Algorithm for "Text to Image" Based on Multi-Channel Attention

Research on text-to-image has gained significant attention. However, existing methods primarily rely on upsampling convolution operations for feature extraction during the initial image generation stage. This approach has inherent limitations, often leading to the loss of global information and the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 13; pp. 144878 - 144886
Main Authors	Yang, Yang, Wahab, Ainuddin Wahid Bin Abdul, Binti Idris, Norisma, Yu, Dingguo, Liu, Chang
Format	Journal Article
Language	English
Published	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	AI-generated images Algorithms Attention mechanisms Computational modeling Convolution Feature extraction Generators High resolution image feature fusion Image processing Image quality Image resolution Image synthesis Learning long-range semantic dependencies Mathematical models Modules Semantics Text to image
Online Access	Get full text
ISSN	2169-3536 2169-3536
DOI	10.1109/ACCESS.2025.3596894

Cover

More Information
Summary:	Research on text-to-image has gained significant attention. However, existing methods primarily rely on upsampling convolution operations for feature extraction during the initial image generation stage. This approach has inherent limitations, often leading to the loss of global information and the inability to capture long-range semantic dependencies. To address these issues, this study proposes a generation algorithm for "text to image" based on multi-channel attention (TTI-MCA). The method integrates a self-supervised module into the initial image generation phase, leveraging attention mechanisms to enable autonomous mapping learning between image features. This facilitates a deep integration of contextual understanding and self-attention learning. Additionally, a feature fusion enhancement module is introduced, which combines low-resolution features from the previous stage with high-resolution features from the current stage. This allows the generation network to fully utilize the rich semantic information of low-level features and the high-resolution details of high-level features, ultimately producing high-quality, realistic images. Experimental results show that TTI-MCA outperforms the baseline algorithm in both Inception Score (IS) and Fréchet Inception Distance (FID), achieving superior performance on the CUB and COCO datasets. This research provides a novel approach to generating high-quality images from text.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3596894