ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술대회 Enhanced Feature Extraction for Speech Detection in Media Audio
Cited 4 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
저자
장인선, 안충현, 서정일, 장윤선
발행일
201708
출처
International Speech Communication Association (INTERSPEECH) 2017, pp.479-483
DOI
https://dx.doi.org/10.21437/Interspeech.2017-792
협약과제
17HR3700, 시청각장애인 방송접근권 향상을 위한 디지털자막·음성해설 서비스 기술 개발, 안충현
초록
Speech detection is an important first step for audio analysis on media contents, whose goal is to discriminate the presence of speech from non-speech. It remains a challenge owing to various sound sources included in media audio. In this work, we present a novel audio feature extraction method to reflect the acoustic characteristic of the media audio in the timefrequency domain. Since the degree of combination of harmonic and percussive components varies depending on the type of sound source, the audio features which further distinguish between speech and non-speech can be obtained by decomposing the signal into both components. For the evaluation, we use over 20 hours of drama which manually annotated for speech detection as well as 4 full-length movies with annotations released for a research community, whose total length is over 8 hours. Experimental results with deep neural network show superior performance of the proposed in media audio condition.
키워드
Speech detection, Voice activity detection
KSP 제안 키워드
Acoustic characteristics, As 4, Audio feature extraction, Deep neural network(DNN), Non-speech, Sound source, Speech detection, Voice Activity Detection(VAD), audio analysis, feature extraction method, superior performance