ETRI Knowledge Sharing Platform : SWAG-Net: Semantic Word-Aware Graph Network for Temporal Video Grounding

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper SWAG-Net: Semantic Word-Aware Graph Network for Temporal Video Grounding

Cited 4 time in scopus

Citation: International Conference on Information and Knowledge Management (CIKM) 2022, pp.982-992

Abstract: In this paper, to effectively capture non-sequential dependencies among semantic words for temporal video grounding, we propose a novel framework called Semantic Word-Aware Graph Network (SWAG-Net), which adopts graph-guided semantic word embedding in an end-to-end manner. Specifically, we define semantic word features as node features of semantic word-aware graphs and word-to-word correlations as three edge types (i.e., intrinsic, extrinsic, and relative edges) for diverse graph structures. We then apply Semantic Word-aware Graph Convolutional Networks (SW-GCNs) to the graphs for semantic word embedding. For modality fusion and context modeling, the embedded features and video segment features are merged into bi-modal features, and the bi-modal features are aggregated by incorporating local and global contextual information. Leveraging the aggregated features, the proposed method effectively finds a temporal boundary semantically corresponding to a sentence query in an untrimmed video. We verify that our SWAG-Net outperforms state-of-the-art methods on Charades-STA and ActivityNet Captions datasets.

KSP Keywords: Bi-modal, Context Modeling, Contextual information, Convolutional networks, End to End(E2E), Graph networks, Graph-guided, Semantic word embedding, Temporal boundary, Video segment, state-of-The-Art

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.