ETRI Knowledge Sharing Platform : A Taxonomy of Knowledge Bases for Retrieval-Augmented Methods in Vision: A Comprehensive Survey

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article A Taxonomy of Knowledge Bases for Retrieval-Augmented Methods in Vision: A Comprehensive Survey

Cited 0 time in scopus

Download 193 time Share share

Abstract: Recent vision–language models (VLMs) demonstrate impressive performance but often hallucinate when essential knowledge is not encoded in their parameters. Retrieval-augmented generation (RAG), which integrates external knowledge bases (KBs), has thus been widely adopted in computer vision. Beyond VLMs, retrieval augmentation has also been applied independently to improve task performance. However, no systematic analysis has investigated which types of KBs are most suitable for particular vision tasks. In this study, we present the first comprehensive survey of more than sixty studies (2021–2025) with a novel focus on KB types. We propose a taxonomy consisting of six categories: unstructured text (V-UT), ontology (V-OT), image (V-IM), image–text pairs (V-IT), structured graphs (V-SG), and domain-specific data (V-DM). For each category, we review retrieval pipelines, downstream tasks, representative datasets, and indexing strategies under consistent criteria. Our analysis shows that, regardless of KB type, most systems converge on dense encoders with vector databases, reflecting a mature technical stack. Nevertheless, this convergence often underutilizes KB-specific structures, highlighting significant opportunities for future studies. Finally, we provide practical guidelines for KB selection and retrieval design in vision and emphasize the need for KB-specific retrieval methods and standardized benchmarks.

Keyword: Computer vision, knowledge base, vision-language models (VLMs), visual RAG

KSP Keywords: Computer Vision(CV), Domain-specific, External knowledge, Knowledge Base, Knowledge bases, Practical Guidelines, Systematic analysis, Unstructured text, language models, need for, task performance

This work is distributed under the term of Creative Commons License (CCL)
(CC BY ND)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.