ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Design of a Convolutional Neural Network for Speech Emotion Recognition
Cited 20 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Kyong Hee Lee, Do Hyun Kim
Issue Date
2020-10
Citation
International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1332-1335
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICTC49870.2020.9289227
Abstract
Regarding speech emotion recognition (SER) using voice, recognition accuracy increases as more data are employed. In particular, in the case of deep learning, a large amount of data is essential. However, when using an existing data set, the size of the data set is limited, and the length of the data constituting the data set can be inconsistent. The data set used in this paper consists of audio files of utterances of various lengths. In this paper, one-dimensional data was extracted from speech files, and two-dimensional mel-spectrogram images were extracted and trained using deep learning techniques such as a multi-layer perceptron (MLP) and a convolutional neural network (CNN). In addition, to improve the test accuracy, audio files were reduced to less than two seconds and preprocessed. Using the CNN, we obtained a test accuracy of approximately 60%.
KSP Keywords
Convolution neural network(CNN), Data sets, Speech Emotion recognition, deep learning(DL), multilayer perceptron, neural network(NN), one-dimensional, recognition accuracy, two-dimensional(2D)