ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article A Taxonomy of Knowledge Bases for Retrieval-Augmented Methods in Vision: A Comprehensive Survey
Cited 0 time in scopus Download 23 time Share share facebook twitter linkedin kakaostory
Authors
Geonwoo Kim, Dong-Hwan Lee, Jang-Hee Yoo
Issue Date
2026-02
Citation
IEEE Access, v.14, pp.32736-37254
ISSN
2169-3536
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/ACCESS.2026.3668187
Abstract
Recent vision–language models (VLMs) demonstrate impressive performance but often hallucinate when essential knowledge is not encoded in their parameters. Retrieval-augmented generation (RAG), which integrates external knowledge bases (KBs), has thus been widely adopted in computer vision. Beyond VLMs, retrieval augmentation has also been applied independently to improve task performance. However, no systematic analysis has investigated which types of KBs are most suitable for particular vision tasks. In this study, we present the first comprehensive survey of more than sixty studies (2021–2025) with a novel focus on KB types. We propose a taxonomy consisting of six categories: unstructured text (V-UT), ontology (V-OT), image (V-IM), image–text pairs (V-IT), structured graphs (V-SG), and domain-specific data (V-DM). For each category, we review retrieval pipelines, downstream tasks, representative datasets, and indexing strategies under consistent criteria. Our analysis shows that, regardless of KB type, most systems converge on dense encoders with vector databases, reflecting a mature technical stack. Nevertheless, this convergence often underutilizes KB-specific structures, highlighting significant opportunities for future studies. Finally, we provide practical guidelines for KB selection and retrieval design in vision and emphasize the need for KB-specific retrieval methods and standardized benchmarks.
Keyword
Computer vision, knowledge base, vision-language models (VLMs), visual RAG
KSP Keywords
Computer Vision(CV), Domain-specific, External knowledge, Knowledge Base, Knowledge bases, Practical Guidelines, Systematic analysis, Unstructured text, language models, need for, task performance
This work is distributed under the term of Creative Commons License (CCL)
(CC BY ND)
CC BY ND