ETRI Knowledge Sharing Platform : Bridging the Lexical Gap: Generative Text-to-Image Retrieval for Parts-of-Speech Imbalance in Vision-Language Models

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Bridging the Lexical Gap: Generative Text-to-Image Retrieval for Parts-of-Speech Imbalance in Vision-Language Models

Cited 0 time in scopus

Download 138 time Share share

Abstract: Retrieving relevant images based on text is challenging due to the non-trivial nature of aligning vision and language representations. Large-scale vision-language models such as CLIP are widely used in recent studies to leverage the pre-trained knowledge of the alignment. However, our observations reveal a performance decrease of 60.8% for verb, adjective, and adverb queries in contrast to noun queries. With preliminary studies, we found that there is an insufficient alignment between image and text regarding specific parts of speech in the popular vision-language models. We also observed that nouns have a high influence on the text-to-image retrieval results of vision-language models. Based on this, this paper proposes a method to generate noun-based queries as part of rewriting queries. First, a large language model extracts nouns that are relevant to the initial query and generates a hypothetical query that best matches the parts of speech alignment in the vision-language model. Then, we verify whether the hypothetical query preserves the original intent of the query and iteratively rewrite it. Our experiments show that our method can significantly enhance text-to-image retrieval performance and highlight the understanding of lexical knowledge in the vision-language models.

KSP Keywords: Hypothetical query, Image retrieval, Part of Speech(POS), Retrieval performance, language models, large-scale

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.