ETRI Knowledge Sharing Platform : Combining Multi-scale Features using Sample-level Deep Convolutional Neural Networks for Weakly Supervised Sound Event Detection

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Combining Multi-scale Features using Sample-level Deep Convolutional Neural Networks for Weakly Supervised Sound Event Detection

Cited - time in scopus

Citation: Detection and Classification of Acoustic Scenes and Events (DCASE) 2017: Workshop, pp.69-73

Abstract: This paper describes our method submitted to large-scale weakly supervised sound event detection for smart cars in the DCASE Challenge 2017. It is based on two deep neural network methods suggested for music auto-tagging. One is training sample-level Deep Convolutional Neural Networks (DCNN) using raw waveforms as a feature extractor. The other is aggregating features on multiscaled models of the DCNNs and making final predictions from them. With this approach, we achieved the best results, 47.3% in F-score on subtask A (audio tagging) and 0.75 in error rate on subtask B (sound event detection) in the evaluation. These results show that the waveform-based models can be comparable to spectrogrambased models when compared to other DCASE Task 4 submissions. Finally, we visualize hierarchically learned filters from the challenge dataset in each layer of the waveform-based model to explain how they discriminate the events.

KSP Keywords: Convolution neural network(CNN), Deep convolutional neural networks, Deep neural network(DNN), F-score, Multi-scale, Network method, Sound event detection(SED), Training samples, Weakly supervised, audio tagging, error rate

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.