ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Enhanced Feature Extraction for Speech Detection in Media Audio
Cited 4 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
장인선, 안충현, 서정일, 장윤선
International Speech Communication Association (INTERSPEECH) 2017, pp.479-483
17HR3700, 시청각장애인 방송접근권 향상을 위한 디지털자막·음성해설 서비스 기술 개발, 안충현
Speech detection is an important first step for audio analysis on media contents, whose goal is to discriminate the presence of speech from non-speech. It remains a challenge owing to various sound sources included in media audio. In this work, we present a novel audio feature extraction method to reflect the acoustic characteristic of the media audio in the timefrequency domain. Since the degree of combination of harmonic and percussive components varies depending on the type of sound source, the audio features which further distinguish between speech and non-speech can be obtained by decomposing the signal into both components. For the evaluation, we use over 20 hours of drama which manually annotated for speech detection as well as 4 full-length movies with annotations released for a research community, whose total length is over 8 hours. Experimental results with deep neural network show superior performance of the proposed in media audio condition.
KSP 제안 키워드
Acoustic characteristics, As 4, Audio feature extraction, Deep neural network(DNN), Non-speech, Sound source, Speech detection, audio analysis, feature extraction method, superior performance