ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Learning to Embed Multi-Modal Contexts for Situated Conversational Agents
Cited 8 time in scopus Download 177 time Share share facebook twitter linkedin kakaostory
Authors
Haeju Lee, Oh Joon Kwon, Yunseon Choi, Minho Park, Ran Han, Yoonhyung Kim, Jinhyeon Kim, Youngjune Lee, Haebin Shin, Kangwook Lee, Kee-Eung Kim
Issue Date
2022-07
Citation
Findings of the Association for Computational Linguistics: NAACL 2022, pp.813-830
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.18653/v1/2022.findings-naacl.61
Abstract
The Situated Interactive Multi-Modal Conversations (SIMMC) 2.0 aims to create virtual shopping assistants that can accept complex multi-modal inputs, i.e. visual appearances of objects and user utterances. It consists of four subtasks, multi-modal disambiguation (MMDisamb), multi-modal coreference resolution (MM-Coref), multi-modal dialog state tracking (MM-DST), and response retrieval and generation. While many task-oriented dialog systems usually tackle each subtask separately, we propose a jointly learned multi-modal encoderdecoder that incorporates visual inputs and performs all four subtasks at once for efficiency. This approach won the MM-Coref and response retrieval subtasks and was nominated runnerup for the remaining subtasks using a single unified model at the 10th Dialog Systems Technology Challenge (DSTC10), setting a high bar for the novel task of multi-modal task-oriented dialog systems.
KSP Keywords
Conversational Agents, Dialog systems, Multi-modal, Response retrieval, Task-oriented, Unified model, coreference resolution, dialog state tracking
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY