ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Woon-Haeng Heo, Hyemi Kim, Oh-Wook Kwon
Issue Date
2020-10
Citation
International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1238-1241
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICTC49870.2020.9289445
Abstract
We propose a multi-scale multi-band dilated time- frequency DenseNet with LSTM for speech enhancement and speech recognition. In the convolutional neural network (CNN)- based architecture, it is important to increase the receptive field effectively in order to sufficiently consider the context information. In our previous study, we designed a dilated dense block that reflects acoustic characteristics by applying dilated convolutions to a densely connected convolutional network (DenseNet) in order to effectively increase the receptive field. In this study, for speech enhancement, we apply the dilated dense blocks to MMDenseLSTM based on a convolutional recurrent neural network (CRNN) which has shown good performance in recent studies using a deep learning architecture. We conduct a speech enhancement and speech recognition experiment using the proposed architecture and several existing deep learning architectures: Gated residual network (GRN), MMDenseLSTM, DilDenseNet. Overall, the proposed architecture shows the best performance compared to other deep learning architectures.