ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Design of a Convolutional Neural Network for Speech Emotion Recognition
Cited 5 time in scopus Download 4 time Share share facebook twitter linkedin kakaostory
이경희, 김도현
International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1332-1335
20ZS1200, 인간중심의 자율지능시스템 원천기술연구, 김도현
Regarding speech emotion recognition (SER) using voice, recognition accuracy increases as more data are employed. In particular, in the case of deep learning, a large amount of data is essential. However, when using an existing data set, the size of the data set is limited, and the length of the data constituting the data set can be inconsistent. The data set used in this paper consists of audio files of utterances of various lengths. In this paper, one-dimensional data was extracted from speech files, and two-dimensional mel-spectrogram images were extracted and trained using deep learning techniques such as a multi-layer perceptron (MLP) and a convolutional neural network (CNN). In addition, to improve the test accuracy, audio files were reduced to less than two seconds and preprocessed. Using the CNN, we obtained a test accuracy of approximately 60%.
convolutional neural network (CNN), speech emotion recognition (SER), utterances
KSP 제안 키워드
Convolution neural network(CNN), Data sets, One-dimensional, Recognition Accuracy, Speech Emotion recognition, deep learning(DL), multilayer perceptron, two-dimensional(2D)