ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper 한국어 장면 텍스트 인식에서 VLM의 시각적 역량 및 강건성 분석
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
김소정, 배유석, 윤기민
Issue Date
2026-04
Citation
지능정보 및 제어 학술대회 (IICC) 2026, pp.42-43
Publisher
대한전자공학회/대한전기학회
Language
Korean
Type
Conference Paper
Abstract
Scene Text Recognition (STR) aims to recognize text within images captured in natural environments and serves as a fundamental component for various downstream applications. While the rapid advancement of Large Vision-Language Models (VLMs) has enabled unified visual understanding and text recognition, systematic analysis of VLM-based STR performance for non-Latin scripts, particularly Korean, remains under-explored. In this study, we evaluate the visual competency and robustness of various VLMs by establishing a Korean scene text benchmark dataset featuring diverse visual perturbations. Our experiments assess model performance under several degradations, including blur, occlusion, rotation, and perspective distortion. The results demonstrate that general-purpose VLMs generally exhibit improved robustness to visual variations compared to traditional OCR approaches, while also revealing model-specific error patterns.