ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술대회 Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder
Cited 5 time in scopus Download 3 time Share share facebook twitter linkedin kakaostory
저자
변준, 신승민, 박영철, 성종모, 백승권
발행일
202109
출처
International Speech Communication Association (INTERSPEECH) 2021, pp.1694-1698
DOI
https://dx.doi.org/10.21437/Interspeech.2021-2151
협약과제
21ZH1200, 초실감 입체공간 미디어·콘텐츠 원천기술연구, 이태진
초록
This paper presents a loss function to compensate for the perceptual loss of the deep neural network (DNN)-based speech coder. By utilizing the psychoacoustic model (PAM), we design a loss function to maximize the mask-to-noise ratio (MNR) in multi-resolution Mel-frequency scales. Also, a perceptual entropy (PE)-based weighting scheme is incorporated onto the MNR loss so that the DNN model focuses more on perceptually important Mel-frequency bands. The proposed loss function was tested on a CNN-based autoencoder implementing the softmax quantization and entropy-based bitrate control. Objective and subjective tests conducted with speech signals showed that the proposed loss function produced higher perceptual quality than the previous perceptual loss functions.
KSP 제안 키워드
Deep neural network(DNN), Mel-frequency, Multi-resolution, Perceptual Quality, Psychoacoustic Model, Speech Signals, Speech coder, Subjective test, entropy-based, frequency band, loss function