ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Combining Multi-scale Features using Sample-level Deep Convolutional Neural Networks for Weakly Supervised Sound Event Detection
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Jongpil Lee, Jiyoung Park, Sangeun Kum, Youngho Jeong, Juhan Nam
Issue Date
2017-11
Citation
Detection and Classification of Acoustic Scenes and Events (DCASE) 2017: Workshop, pp.69-73
Language
English
Type
Conference Paper
Abstract
This paper describes our method submitted to large-scale weakly supervised sound event detection for smart cars in the DCASE Challenge 2017. It is based on two deep neural network methods suggested for music auto-tagging. One is training sample-level Deep Convolutional Neural Networks (DCNN) using raw waveforms as a feature extractor. The other is aggregating features on multiscaled models of the DCNNs and making final predictions from them. With this approach, we achieved the best results, 47.3% in F-score on subtask A (audio tagging) and 0.75 in error rate on subtask B (sound event detection) in the evaluation. These results show that the waveform-based models can be comparable to spectrogrambased models when compared to other DCASE Task 4 submissions. Finally, we visualize hierarchically learned filters from the challenge dataset in each layer of the waveform-based model to explain how they discriminate the events.
KSP Keywords
Convolution neural network(CNN), Deep convolutional neural networks, Deep neural network(DNN), F-score, Multi-scale, Network method, Sound event detection(SED), Training samples, Weakly supervised, audio tagging, error rate