ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술대회 SWAG-Net: Semantic Word-Aware Graph Network for Temporal Video Grounding
Cited 1 time in scopus Download 1 time Share share facebook twitter linkedin kakaostory
저자
김선오, 하태길, 윤기민, 최진영
발행일
202210
출처
International Conference on Information and Knowledge Management (CIKM) 2022, pp.982-992
DOI
https://dx.doi.org/10.1145/3511808.3557463
협약과제
21HS4600, (딥뷰-1세부) 실시간 대규모 영상 데이터 이해·예측을 위한 고성능 비주얼 디스커버리 플랫폼 개발, 배유석
초록
In this paper, to effectively capture non-sequential dependencies among semantic words for temporal video grounding, we propose a novel framework called Semantic Word-Aware Graph Network (SWAG-Net), which adopts graph-guided semantic word embedding in an end-to-end manner. Specifically, we define semantic word features as node features of semantic word-aware graphs and word-to-word correlations as three edge types (i.e., intrinsic, extrinsic, and relative edges) for diverse graph structures. We then apply Semantic Word-aware Graph Convolutional Networks (SW-GCNs) to the graphs for semantic word embedding. For modality fusion and context modeling, the embedded features and video segment features are merged into bi-modal features, and the bi-modal features are aggregated by incorporating local and global contextual information. Leveraging the aggregated features, the proposed method effectively finds a temporal boundary semantically corresponding to a sentence query in an untrimmed video. We verify that our SWAG-Net outperforms state-of-the-art methods on Charades-STA and ActivityNet Captions datasets.
KSP 제안 키워드
Bi-modal, Context Modeling, Contextual information, Convolutional networks, End to End(E2E), Graph networks, Graph-guided, Semantic word embedding, Temporal boundary, state-of-The-Art, video segment