ETRI Knowledge Sharing Platform : Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music

Cited 0 time in scopus

Citation: International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1238-1241

Abstract: We propose a multi-scale multi-band dilated time- frequency DenseNet with LSTM for speech enhancement and speech recognition. In the convolutional neural network (CNN)- based architecture, it is important to increase the receptive field effectively in order to sufficiently consider the context information. In our previous study, we designed a dilated dense block that reflects acoustic characteristics by applying dilated convolutions to a densely connected convolutional network (DenseNet) in order to effectively increase the receptive field. In this study, for speech enhancement, we apply the dilated dense blocks to MMDenseLSTM based on a convolutional recurrent neural network (CRNN) which has shown good performance in recent studies using a deep learning architecture. We conduct a speech enhancement and speech recognition experiment using the proposed architecture and several existing deep learning architectures: Gated residual network (GRN), MMDenseLSTM, DilDenseNet. Overall, the proposed architecture shows the best performance compared to other deep learning architectures.

KSP Keywords: Acoustic characteristics, Background music, Best performance, Context Information, Convolution neural network(CNN), Convolutional networks, Deep Learning Architectures, Dense block, Multi-scale, Receptive field, Residual Network

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.