ETRI Knowledge Sharing Platform : ENHANCING MULTISCALE FEATURE REPRESENTATION FOR OBJECT-LEVEL RECOGNITION IN MASKED IMAGE MODELING

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper ENHANCING MULTISCALE FEATURE REPRESENTATION FOR OBJECT-LEVEL RECOGNITION IN MASKED IMAGE MODELING

Cited 0 time in scopus

Citation: International Conference on Image Processing (ICIP) 2025, pp.677-682

Abstract: Masked image modeling (MIM), which is a self-supervised learning method in computer vision, excels in image- and video-level recognition tasks by providing robust and generalized feature representations. However, most MIM methods incorporate plain Vision Transformers (ViTs), which lack the capability to produce multiscale features, thereby limiting their effectiveness in more complex object-level recognition tasks. Extracting multiscale hierarchical features using a convolutional stem and fully fusing local and global information within all feature representations are crucial for applying the MIM framework to object-level recognition. To address this issue, we propose an effective multiscale feature extraction mechanism that integrates local and global dependencies from the convolutional stem and ViT within the MIM framework. Our method was evaluated on object detection and instance segmentation tasks using the MS COCO dataset. It exhibits superior performance by effectively fusing local and global information across all feature scales, achieving comparable results to those of state-of-the-art methods while using 25% fewer training samples.

KSP Keywords: Complex object, Computer Vision(CV), Extraction mechanism, Feature Representation, Image modeling, Object-level, Supervised learning method, Training samples, feature extraction, global information, object detection

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.