ETRI Knowledge Sharing Platform : Noise Robust Feature for Automatic Speech Recognition based on Mel-spectrogram Gradient Histogram

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Noise Robust Feature for Automatic Speech Recognition based on Mel-spectrogram Gradient Histogram

Cited - time in scopus

Citation: Workshop on Speech, Language and Audio in Multimedia (SLAM) 2014, pp.1-5

Abstract: This paper proposes an alternative scheme for extracting speech features in an automatic speech recognition (ASR) system. If an ASR system is trained using a clean speech source, a noisy environment may cause a mismatch between the features from the recognition data and those from the training data. This mismatch deteriorates the recognition accuracy. Thus, unlike in existing speech features, another approach to minimizing the mismatches between clean and noisy speech features is needed. In this paper, we propose a feature extraction technique that is robust to noisy environments. The proposed scheme is based on the weighted histogram of the time-frequency gradient in a Melspectrogram image. Unlike previous approaches that use the magnitude of a Mel-spectrogram, we use the angle and magnitude information of a local gradient by employing a weighted histogram. Thus, our proposed speech feature shows a lower mean square error (MSE) between clean and noisy condition features as compared to other well-known speech features. In addition, the proposed scheme improves the word recognition test in a noisy environment with a relatively smaller number of coefficients as compared to similar studies.

KSP Keywords: Clean speech, Extraction technique, Feature extractioN, Local Gradient(LG), Magnitude information, Noisy Conditions, Speech features, Speech source, Word Recognition, automatic speech recognition(ASR), gradient histogram

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.