ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Emotion-Aware Speaker Identification With Transfer Learning
Cited 6 time in scopus Download 109 time Share share facebook twitter linkedin kakaostory
Authors
Kyoungju Noh, Hyuntae Jeong
Issue Date
2023-07
Citation
IEEE Access, v.11, pp.77292-77306
ISSN
2169-3536
Publisher
Institute of Electrical and Electronics Engineers Inc.
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/ACCESS.2023.3297715
Abstract
Speech is a natural communication method used by humans. Speaker identification (SI) technology based on human speech has been used as an entry point for many human–computer-interaction applications. The performance of SI models can degrade when dealing with expressive speech uttered in emotional situations because emotion databases do not have sufficient data on expressive speech to train SI models for various emotional states. Generally, SI models are trained using relatively more samples of “neutral” speech than samples of other emotion classes. In this study, we propose an emotion-aware SI (em-SI) method that uses an emotion-embedding vector generated from a pre-trained speech emotion recognition (SER) model along with the acoustic features of speech data. We assess the performance of this method using individual English and Korean corpora and confirm that the proposed method provides an improved performance on multilingual corpora. The evaluation results show that the SI accuracy of em-SI on the Korean Emotion Multimodal Database (KEMDy19) improved by 3.2%, and the average speaker verification (SV) performance in terms of the equal error rate (EER) was improved by 1.3% compared to that of the baseline SI model. The visualization of the embedding vector of em-SI shows that em-SI maps speech data to an embedding space where both SI and emotional information are simultaneously represented. Through the experiments conducted in this study, we confirmed that the em-SI model, which learns by integrating emotion and speaker embedding information, improved the performance of SI for expressive speech.
KSP Keywords
Embedding space, Emotion-aware, Emotional states, Equal error rate, Improved performance, Multimodal database, SI model, Speaker Identification(SI), Speaker verification, Speech Emotion recognition, Transfer learning
This work is distributed under the term of Creative Commons License (CCL)
(CC BY NC ND)
CC BY NC ND