ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article LLM-powered scene graph representation learning for image retrieval via visual triplet-based graph transformation
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Soohwan Jeong, Jongmin Park, Mingyu Choi, Yongjin Kwon, Sungsu Lim
Issue Date
2025-08
Citation
Expert Systems with Applications, v.286, pp.1-13
ISSN
0957-4174
Publisher
Elsevier Ltd.
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1016/j.eswa.2025.127926
Abstract
A scene graph represents the relational information between objects within an image, conveying its inherent semantic content. Current image retrieval methods, which use images as queries to find similar ones, typically rely on visual content or basic structural similarities in scene graphs. However, these methods use only basic and surface-level information, overlooking the high-level semantic information embedded in the scene graph. In this study, we leverage visual triplet units, consisting of subject-relation-object pairs in the scene graph, to capture high-level semantics more effectively. To enhance the triplets, we incorporate extensive knowledge from large language models (LLMs). We propose Visual Triplet-based Graph Transformation (VTGT), a framework that transforms the scene graph into a visual triplet-based graph, which is the triplets serve as the nodes. This transformed graph is then processed by a graph neural network (GNN) to learn an optimal scene graph representation. Experimental results in image retrieval demonstrate the superior performance of our approach, driven by the LLM-powered visual triplet-based graph representation.
KSP Keywords
Current image, Graph Transformation, Graph representation, Image retrieval, Language Model, Representation learning, Scene graph, Semantic content, Structure Similarity Index measure(SSIM), high-level semantics, neural network(NN)