ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Noise Robust Feature for Automatic Speech Recognition based on Mel-spectrogram Gradient Histogram
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Taejin Park, Seungkwon Beack, Taejin Lee
Issue Date
2014-09
Citation
Workshop on Speech, Language and Audio in Multimedia (SLAM) 2014, pp.1-5
Language
English
Type
Conference Paper
Abstract
This paper proposes an alternative scheme for extracting speech features in an automatic speech recognition (ASR) system. If an ASR system is trained using a clean speech source, a noisy environment may cause a mismatch between the features from the recognition data and those from the training data. This mismatch deteriorates the recognition accuracy. Thus, unlike in existing speech features, another approach to minimizing the mismatches between clean and noisy speech features is needed. In this paper, we propose a feature extraction technique that is robust to noisy environments. The proposed scheme is based on the weighted histogram of the time-frequency gradient in a Melspectrogram image. Unlike previous approaches that use the magnitude of a Mel-spectrogram, we use the angle and magnitude information of a local gradient by employing a weighted histogram. Thus, our proposed speech feature shows a lower mean square error (MSE) between clean and noisy condition features as compared to other well-known speech features. In addition, the proposed scheme improves the word recognition test in a noisy environment with a relatively smaller number of coefficients as compared to similar studies.
KSP Keywords
Clean speech, Extraction technique, Feature extractioN, Local Gradient(LG), Magnitude information, Noisy Conditions, Speech features, Speech source, Word Recognition, automatic speech recognition(ASR), gradient histogram