ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper BMRN: Boundary Matching and Refinement Network for Temporal Moment Localization with Natural Language
Cited 3 time in scopus Download 148 time Share share facebook twitter linkedin kakaostory
Authors
Muah Seol, Jonghee Kim, Jinyoung Moon
Issue Date
2023-06
Citation
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023, pp.5570-5578
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/CVPRW59228.2023.00589
Abstract
Temporal moment localization (TML) aims to retrieve the best moment in a video that matches a given sentence query. This task is challenging as it requires understanding the relationship between a video and a sentence, as well as the semantic meaning of both. TML methods using 2D temporal maps, which represent proposal features or scores on all moment proposals with the boundary of start and end times on the m and n axes, have shown performance improvements by modeling moment proposals in relation to each other. The methods, however, are limited by the coarsely pre-defined fixed boundaries of target moments, which depend on the length of training videos and the amount of memory available. To overcome this limitation, we propose a boundary matching and refinement network (BMRN) that generates 2D boundary matching and refinement maps along with a proposal feature map to obtain the final proposal score map. Our BMRN adjusts the fixed boundaries of moment proposals with predicted center and length offsets from boundary refinement maps. In addition, we introduce a length-aware proposal feature map that combines a cross-modal feature map and a similarity map between the predicted duration of the target moment and moment proposals. Our approach leads to improved TML performance on Charades-STA and ActivityNet Captions datasets, outperforming state-of-the-art methods by a large margin.
KSP Keywords
Feature map, Large margin, Natural language, cross-modal, state-of-The-Art