ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets
Cited 12 time in scopus Download 107 time Share share facebook twitter linkedin kakaostory
저자
노경주, 정치윤, 임지연, 정승은, 김가규, 임정묵, 정현태
발행일
202103
출처
Sensors, v.21 no.5, pp.1-18
ISSN
1424-8220
출판사
MDPI
DOI
https://dx.doi.org/10.3390/s21051579
협약과제
20ZS1100, 자율성장형 복합인공지능 원천기술 연구, 송화전
초록
Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IE-MOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.
KSP 제안 키워드
Audio classification, Bidirectional Long Short-Term Memory, Classification models, Long-short term memory(LSTM), Loss-based, Memory-based, Model generalization, Motion Capture Database, Multi-Domain, Multi-path, Proposed model
본 저작물은 크리에이티브 커먼즈 저작자 표시 (CC BY) 조건에 따라 이용할 수 있습니다.
저작자 표시 (CC BY)