ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Multi-Scale Multi-Band Dilated DenseLSTM for Robust Recognition of Speech with Background Music
Cited 0 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
허운행, 김혜미, 권오욱
International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1238-1241
20IH2400, 음악 및 동영상 모니터링을 위한 지능형 마이크로 식별 기술 개발, 박지현
We propose a multi-scale multi-band dilated time- frequency DenseNet with LSTM for speech enhancement and speech recognition. In the convolutional neural network (CNN)- based architecture, it is important to increase the receptive field effectively in order to sufficiently consider the context information. In our previous study, we designed a dilated dense block that reflects acoustic characteristics by applying dilated convolutions to a densely connected convolutional network (DenseNet) in order to effectively increase the receptive field. In this study, for speech enhancement, we apply the dilated dense blocks to MMDenseLSTM based on a convolutional recurrent neural network (CRNN) which has shown good performance in recent studies using a deep learning architecture. We conduct a speech enhancement and speech recognition experiment using the proposed architecture and several existing deep learning architectures: Gated residual network (GRN), MMDenseLSTM, DilDenseNet. Overall, the proposed architecture shows the best performance compared to other deep learning architectures.
DenseNet, dilated convolution, receptive field, speech enhancement
KSP 제안 키워드
Acoustic characteristics, Background music, Best performance, Context Information, Convolution neural network(CNN), Convolutional networks, Deep Learning Architectures, Dense block, Dilated Convolution, Multi-scale, Receptive field