ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Enhanced Feature Extraction for Speech Detection in Media Audio
Cited 4 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Inseon Jang, ChungHyun Ahn, Jeongil Seo, Younseon Jang
Issue Date
2017-08
Citation
International Speech Communication Association (INTERSPEECH) 2017, pp.479-483
Publisher
ISCA
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.21437/Interspeech.2017-792
Abstract
Speech detection is an important first step for audio analysis on media contents, whose goal is to discriminate the presence of speech from non-speech. It remains a challenge owing to various sound sources included in media audio. In this work, we present a novel audio feature extraction method to reflect the acoustic characteristic of the media audio in the timefrequency domain. Since the degree of combination of harmonic and percussive components varies depending on the type of sound source, the audio features which further distinguish between speech and non-speech can be obtained by decomposing the signal into both components. For the evaluation, we use over 20 hours of drama which manually annotated for speech detection as well as 4 full-length movies with annotations released for a research community, whose total length is over 8 hours. Experimental results with deep neural network show superior performance of the proposed in media audio condition.
KSP Keywords
Acoustic characteristics, As 4, Audio feature extraction, Deep neural network(DNN), Non-speech, Sound source, Speech detection, audio analysis, feature extraction method, superior performance