ETRI Knowledge Sharing Platform : 한국어 장면 텍스트 인식에서 VLM의 시각적 역량 및 강건성 분석

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper 한국어 장면 텍스트 인식에서 VLM의 시각적 역량 및 강건성 분석

Cited - time in scopus

Abstract: Scene Text Recognition (STR) aims to recognize text within images captured in natural environments and serves as a fundamental component for various downstream applications. While the rapid advancement of Large Vision-Language Models (VLMs) has enabled unified visual understanding and text recognition, systematic analysis of VLM-based STR performance for non-Latin scripts, particularly Korean, remains under-explored. In this study, we evaluate the visual competency and robustness of various VLMs by establishing a Korean scene text benchmark dataset featuring diverse visual perturbations. Our experiments assess model performance under several degradations, including blur, occlusion, rotation, and perspective distortion. The results demonstrate that general-purpose VLMs generally exhibit improved robustness to visual variations compared to traditional OCR approaches, while also revealing model-specific error patterns.

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.