ETRI Knowledge Sharing Platform : Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder

Cited 9 time in scopus

Authors: Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, Seungkwon Beack

Citation: International Speech Communication Association (INTERSPEECH) 2021, pp.1694-1698

Abstract: This paper presents a loss function to compensate for the perceptual loss of the deep neural network (DNN)-based speech coder. By utilizing the psychoacoustic model (PAM), we design a loss function to maximize the mask-to-noise ratio (MNR) in multi-resolution Mel-frequency scales. Also, a perceptual entropy (PE)-based weighting scheme is incorporated onto the MNR loss so that the DNN model focuses more on perceptually important Mel-frequency bands. The proposed loss function was tested on a CNN-based autoencoder implementing the softmax quantization and entropy-based bitrate control. Objective and subjective tests conducted with speech signals showed that the proposed loss function produced higher perceptual quality than the previous perceptual loss functions.

KSP Keywords: Deep neural network(DNN), Mel-frequency, Multi-resolution, Perceptual Quality, Psychoacoustic Model, Speech Signals, Speech coder, Subjective test, Weighting scheme, entropy-based, frequency band

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.