ETRI Knowledge Sharing Platform : Design of a Convolutional Neural Network for Speech Emotion Recognition

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Design of a Convolutional Neural Network for Speech Emotion Recognition

Cited 20 time in scopus

Citation: International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1332-1335

Abstract: Regarding speech emotion recognition (SER) using voice, recognition accuracy increases as more data are employed. In particular, in the case of deep learning, a large amount of data is essential. However, when using an existing data set, the size of the data set is limited, and the length of the data constituting the data set can be inconsistent. The data set used in this paper consists of audio files of utterances of various lengths. In this paper, one-dimensional data was extracted from speech files, and two-dimensional mel-spectrogram images were extracted and trained using deep learning techniques such as a multi-layer perceptron (MLP) and a convolutional neural network (CNN). In addition, to improve the test accuracy, audio files were reduced to less than two seconds and preprocessed. Using the CNN, we obtained a test accuracy of approximately 60%.

KSP Keywords: Convolution neural network(CNN), Data sets, Speech Emotion recognition, deep learning(DL), multilayer perceptron, neural network(NN), one-dimensional, recognition accuracy, two-dimensional(2D)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.