ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder
Cited 8 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, Seungkwon Beack
Issue Date
2021-09
Citation
International Speech Communication Association (INTERSPEECH) 2021, pp.1694-1698
Publisher
ISCA
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.21437/Interspeech.2021-2151
Abstract
This paper presents a loss function to compensate for the perceptual loss of the deep neural network (DNN)-based speech coder. By utilizing the psychoacoustic model (PAM), we design a loss function to maximize the mask-to-noise ratio (MNR) in multi-resolution Mel-frequency scales. Also, a perceptual entropy (PE)-based weighting scheme is incorporated onto the MNR loss so that the DNN model focuses more on perceptually important Mel-frequency bands. The proposed loss function was tested on a CNN-based autoencoder implementing the softmax quantization and entropy-based bitrate control. Objective and subjective tests conducted with speech signals showed that the proposed loss function produced higher perceptual quality than the previous perceptual loss functions.
KSP Keywords
Deep neural network(DNN), Mel-frequency, Multi-resolution, Perceptual Quality, Psychoacoustic Model, Speech Signals, Speech coder, Subjective test, Weighting scheme, entropy-based, frequency band