ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Mingyu Jeon, Sunjae Yoon, Jonghee Kim, Junyeoung Kim
Issue Date
2026-01
Citation
The Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI) 2026, pp.1-9
Publisher
Association for the Advancement of Artificial Intelligence
Language
English
Type
Conference Paper
Abstract
Zero-shot video moment retrieval (ZVMR) is the task of localizing a temporal moment within an untrimmed video using a natural language query without relying on task-specific training data. The primary challenge in this setting lies in the mismatch in semantic granularity between textual queries and visual content. Previous studies in ZVMR have attempted to achieve alignment by leveraging high-quality pre-trained knowledge that represents video and language in a joint space. However, these approaches failed to balance the semantic granularity between the pre-trained knowledge provided by each modality for a given scene. As a result, despite the high quality of each modality’s representations, the mismatch in granularity led to inaccurate retrieval. In this paper, we propose a training-free framework, called Granularity- Aware Alignment (GranAlign), that bridges this gap between coarse and fine semantic representations. Our approach introduces two complementary techniques: granularity-based query rewriting to generate varied semantic granularities, and query-aware caption generation to embed query intent into video content. By pairing multi-level queries with both queryagnostic and query-aware captions, we effectively resolve semantic mismatches. As a result, our method sets a new stateof- the-art across all three major benchmarks (QVHighlights, Charades-STA, ActivityNet-Captions), with a notable 3.23% mAP@avg improvement on the QVHighlights dataset.
KSP Keywords
Caption generation, High-quality, Multi-level, Natural language queries, Query intent, Query rewriting, Semantic representations, Task-specific, Video content, Zero-shot, complementary techniques