ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Recursive inference for individual’s identification in video data based on the description
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Dearo Kim, Jiyoun Lim, Jeongwoo Son, Namkyung Lee
Issue Date
2023-10
Citation
International Conference on Information and Communication Technology Convergence (ICTC) 2023, pp.1330-1333
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICTC58733.2023.10392608
Abstract
This study aims to enhance the Textual and Visual Representations for the identification of the same individuals within video descriptions. The character inference and ID generation for identification of individuals based on Transformer. A scene of a video visually represents that ‘who’ acts ‘what’ and ‘how’ in ‘where’ and ‘when’. Among entities composing a scene, a person plays a crucial role in representing the context of the scene. Thus, in the video description problem, various ways to identify the persons in the scene have been studied. This paper deals with the problem of ‘fill-in-the characters’ that aims to predict the local IDs of characters that appeared through several consecutive scenes. In this task, it is demanded to predict local IDs of character that it is not required to recognize each character globally (in an entire movie), but locally (within a set of 5 clips). Due to the restriction of the problem definition, global identifications of characters cannot be obtained with contemporary methods while it is often required to deploy them in services and applications. To resolve this problem, we propose the method of recursive inference of local ID. Additionally, we propose optimizing Bert embedding for mask tokens from video descriptions to infer character’s local IDs. According to the experimental results, the recursive process allows the acquisition of coherent representations among unique individuals.
KSP Keywords
Fill-in, Problem Definition, Recursive process, Video data, Visual Representation, video description