ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술지 Siamese Feedback Network for Visual Object Tracking
Cited 0 time in scopus Download 2 time Share share facebook twitter linkedin kakaostory
권미경, 김진희, 엄기문, 이희경, 서정일, 임성용, 양승준, 김원준
IEIE Transactions on Smart Processing and Computing, v.11 no.1, pp.24-33
대한전자공학회 (IEIE)
21HH4900, [전문연구실] 이머시브 미디어 전문연구실, 서정일
Visual object tracking, one of the main topics in computer vision, aims to chase a target object in every frame of the video sequences. In particular, Siamese-based network architectures have been adopted widely for visual object tracking due to their correlation-based nature. On the other hand, the features encoded from the target template and the search image in Siamese branches still suffer from ambiguities, which are driven by complicated real-world environments, e.g., occlusions and rotations. This paper proposes the Siamese feedback network for robust object tracking. The key idea of the proposed method is to encode target-relevant features accurately via the feedback block, which is defined by a combination of attention and refinement modules. Specifically, interdependent features are extracted through self- and cross-attention operations. Subsequently, such re-calibrated features are refined in both spatial and channel-wise manner. Those are fed back to the input of the feedback block again via the feedback loop. This is desirable because the high-level semantic information guides the feedback block to learn more meaningful properties of the target object and its surroundings. The experimental results show that the proposed method outperforms the state-of-the-art Siamese-based methods with a gain of 0.72% and 1.69% for the expected average overlap on the VOT2016 and VOT2018 datasets, respectively. Overall, the proposed method is effective for visual object tracking, even with complicated real-world scenarios.
KSP 제안 키워드
Computer Vision(CV), Correlation-based, Feedback Loop, Network Architecture, Real-world, Robust object tracking, Target template, Visual Object tracking, feedback network, search image, self-