Clartemis: Clip-Based Attention-Based Retrieval with Text-Explicit Matching and Implicit Similarity

This study tackles the challenge of image retrieval using text feedback, a crucial task in e-commerce. To address this, we introduce an improved framework called CLARTEMIS, built upon the foundation of ARTEMIS [4]. Our proposed approach incorporates a pre-trained vision-language model to align image...

Full description

Saved in:

Bibliographic Details
Published in	International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) (Online) pp. 01 - 05
Main Authors	Tran, Hoang-Anh, Le, Hoanh-Su, Nguyen, Phuc
Format	Conference Proceeding
Language	English
Published	IEEE 14.12.2024
Subjects	Adaptation models Benchmark testing Computational modeling Contrastive learning Image retrieval Media Multi-modal representation learning Multimodal retrieval Representation learning Semantics Streaming media Technological innovation Vision-language models Visualization
Online Access	Get full text
ISBN	9798331519230
ISSN	2576-8964
DOI	10.1109/ICCWAMTIP64812.2024.10873653

Cover

More Information
Summary:	This study tackles the challenge of image retrieval using text feedback, a crucial task in e-commerce. To address this, we introduce an improved framework called CLARTEMIS, built upon the foundation of ARTEMIS [4]. Our proposed approach incorporates a pre-trained vision-language model to align image and text features within a unified semantic space. This integration not only streamlines multimodal representation learning but also enhances the consistency of the feature space for correlation modeling. The CLARTEMIS framework utilizes a robust joint embedding strategy, effectively aligning reference images, modification texts, and target images. Comprehensive experiments on a benchmark dataset validate the effectiveness of our method, demonstrating notable improvements in retrieval accuracy and underscoring the potential of pre-trained models in advancing multimodal retrieval systems.
ISBN:	9798331519230
ISSN:	2576-8964
DOI:	10.1109/ICCWAMTIP64812.2024.10873653