ETRI Knowledge Sharing Platform : See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

Cited 0 time in scopus

Download 66 time Share share

Authors: Mingyu Jeon, Sungjin Han, Jinkwon Hwang, Minchol Kwon, Jonghee Kim, Junyeong Kim

Citation: Findings of the Association for Computational Linguistics: EACL 2026, pp.1726-1736

Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have improved image recognition and reasoning, but video-related tasks remain challenging due to memory constraints from dense frame processing. Existing Video Moment Retrieval (VMR) methodologies rely on sparse frame sampling, risking potential information loss, especially in lengthy videos. We propose SMORE (See MORE, store less), a framework that enhances memory efficiency while maintaining high information resolution. SMORE (1) uses query-guided captions to encode semantics aligned with user intent, (2) applies query-aware importance modulation to highlight relevant segments, and (3) adaptively compresses frames to preserve key content while reducing redundancy. This enables efficient video understanding without exceeding memory budgets. Experimental validation reveals that SMORE achieves state-of-the-art performance on QVHighlights, Charades-STA, and ActivityNet-Captions benchmarks.

KSP Keywords: Art performance, Experimental validation, Frame processing, Information loss, Language Models, Memory Efficiency, Potential information, User Intent, Video understanding, image recognition, memory-efficient

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.