ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Image classification and captioning model considering a CAM‐based disagreement loss
Cited 8 time in scopus Download 223 time Share share facebook twitter linkedin kakaostory
Authors
Yeo Chan Yoon, So Young Park, Soo Myoung Park, Heuiseok Lim
Issue Date
2020-02
Citation
ETRI Journal, v.42, no.1, pp.67-77
ISSN
1225-6463
Publisher
한국전자통신연구원 (ETRI)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.4218/etrij.2018-0621
Abstract
Image captioning has received significant interest in recent years, and notable results have been achieved. Most previous approaches have focused on generating visual descriptions from images, whereas a few approaches have exploited visual descriptions for image classification. This study demonstrates that a good performance can be achieved for both description generation and image classification through an end-to-end joint learning approach with a loss function, which encourages each task to reach a consensus. When given images and visual descriptions, the proposed model learns a multimodal intermediate embedding, which can represent both the textual and visual characteristics of an object. The performance can be improved for both tasks by sharing the multimodal embedding. Through a novel loss function based on class activation mapping, which localizes the discriminative image region of a model, we achieve a higher score when the captioning and classification model reaches a consensus on the key parts of the object. Using the proposed model, we established a substantially improved performance for each task on the UCSD Birds and Oxford Flowers datasets.
This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)
Type 4: