ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper ENHANCING MULTISCALE FEATURE REPRESENTATION FOR OBJECT-LEVEL RECOGNITION IN MASKED IMAGE MODELING
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Tsatsral Amarbayasgalan, Sungjun Wang, Mooseop Kim, Chi Yoon Jeong
Issue Date
2025-09
Citation
International Conference on Image Processing (ICIP) 2025, pp.677-682
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICIP55913.2025.11084319
Abstract
Masked image modeling (MIM), which is a self-supervised learning method in computer vision, excels in image- and video-level recognition tasks by providing robust and generalized feature representations. However, most MIM methods incorporate plain Vision Transformers (ViTs), which lack the capability to produce multiscale features, thereby limiting their effectiveness in more complex object-level recognition tasks. Extracting multiscale hierarchical features using a convolutional stem and fully fusing local and global information within all feature representations are crucial for applying the MIM framework to object-level recognition. To address this issue, we propose an effective multiscale feature extraction mechanism that integrates local and global dependencies from the convolutional stem and ViT within the MIM framework. Our method was evaluated on object detection and instance segmentation tasks using the MS COCO dataset. It exhibits superior performance by effectively fusing local and global information across all feature scales, achieving comparable results to those of state-of-the-art methods while using 25% fewer training samples.
KSP Keywords
Complex object, Computer Vision(CV), Extraction mechanism, Feature Representation, Feature extractioN, Image modeling, Object-level, Supervised learning method, Training samples, global information, object detection