ETRI Knowledge Sharing Platform : A Study of Evaluation Metrics and Datasets for Video Captioning

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper A Study of Evaluation Metrics and Datasets for Video Captioning

Cited 13 time in scopus

Citation: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) 2017, pp.172-175

Abstract: With the fast growing interest in deep learning, various applications and machine learning tasks are emerged in recent years. Video captioning is especially gaining a lot of attention from both computer vision and natural language processing fields. Generating captions is usually performed by jointly learning of different types of data modalities that share common themes in the video. Learning with the joining representations of different modalities is very challenging due to the inherent heterogeneity resided in the mixed information of visual scenes, speech dialogs, music and sounds, and etc. Consequently, it is hard to evaluate the quality of video captioning results. In this paper, we introduce well-known metrics and datasets for evaluation of video captioning. We compare the the existing metrics and datasets to derive a new research proposal for the evaluation of video descriptions.

KSP Keywords: Computer Vision(CV), Inherent heterogeneity, Natural Language Processing(NLP), Video Captioning, Visual scenes, deep learning(DL), evaluation metrics, machine Learning, research proposal

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.