ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Integrating Dilated Convolution into DenseLSTM for Audio Source Separation
Cited 5 time in scopus Download 113 time Share share facebook twitter linkedin kakaostory
저자
허운행, 김혜미, 권오욱
발행일
202101
출처
Applied Sciences, v.11 no.2, pp.1-19
ISSN
2076-3417
출판사
MDPI
DOI
https://dx.doi.org/10.3390/app11020789
협약과제
20IH2400, 음악 및 동영상 모니터링을 위한 지능형 마이크로 식별 기술 개발, 박지현
초록
Herein, we proposed a multi-scale multi-band dilated time-frequency densely connected convolutional network (DenseNet) with long short-term memory (LSTM) for audio source separation. Because the spectrogram of the acoustic signal can be thought of as images as well as time series data, it is suitable for convolutional recurrent neural network (CRNN) architecture. We improved the audio source separation performance by applying the dilated block with a dilated convolution to CRNN architecture. The dilated block has the role of effectively increasing the receptive field in the spectrogram. In addition, it was designed in consideration of the acoustic characteristics that the frequency axis and the time axis in the spectrogram are changed by independent influences such as speech rate and pitch. In speech enhancement experiments, we estimated the speech signal using various deep learning architectures from a signal in which the music, noise, and speech were mixed. We conducted the subjective evaluation on the estimated speech signal. In addition, speech quality, intelligibility, separation, and speech recognition performance were also measured. In music signal separation, we estimated the music signal using several deep learning architectures from the mixture of the music and speech signal. After that, the separation performance and music identification accuracy were measured using the estimated music signal. Overall, the proposed architecture shows the best performance compared to other deep learning architectures not only in speech experiments but also in music experiments.
KSP 제안 키워드
Acoustic characteristics, Acoustic signal, Audio source separation, Best performance, Convolutional networks, Deep Learning Architectures, Dilated Convolution, Long-short term memory(LSTM), Multi-scale, Music identification, Receptive field
본 저작물은 크리에이티브 커먼즈 저작자 표시 (CC BY) 조건에 따라 이용할 수 있습니다.
저작자 표시 (CC BY)