ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Woon-Haeng Heo, Hyemi Kim, Oh-Wook Kwon
Issue Date
2020-10
Citation
International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1238-1241
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICTC49870.2020.9289445
Abstract
We propose a multi-scale multi-band dilated time- frequency DenseNet with LSTM for speech enhancement and speech recognition. In the convolutional neural network (CNN)- based architecture, it is important to increase the receptive field effectively in order to sufficiently consider the context information. In our previous study, we designed a dilated dense block that reflects acoustic characteristics by applying dilated convolutions to a densely connected convolutional network (DenseNet) in order to effectively increase the receptive field. In this study, for speech enhancement, we apply the dilated dense blocks to MMDenseLSTM based on a convolutional recurrent neural network (CRNN) which has shown good performance in recent studies using a deep learning architecture. We conduct a speech enhancement and speech recognition experiment using the proposed architecture and several existing deep learning architectures: Gated residual network (GRN), MMDenseLSTM, DilDenseNet. Overall, the proposed architecture shows the best performance compared to other deep learning architectures.
KSP Keywords
Acoustic characteristics, Background music, Best performance, Context Information, Convolution neural network(CNN), Convolutional networks, Deep Learning Architectures, Dense block, Multi-scale, Receptive field, Residual Network