ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator
Cited 5 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang
Issue Date
2023-06
Citation
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, pp.1-5
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICASSP49357.2023.10097001
Abstract
In this paper, we improve the efficiency of the progressive multi-stage neural audio codec (PR-Codec) by utilizing perceptually motivated training criteria. Although our baseline PR-Codec successfully reconstructs full-band signals by progressively decoding the pre-defined subband signals, transparent quality can only be guaranteed in high bit-rates. To reduce bit-rates while maintaining perceptually transparent quality, we adopt a psychoacoustic model (PAM)-based loss and propose a perceptual weighting discriminator (PWD), which enables us to synthesize and discriminate audio signals in the perceptually motivated domain. We also introduce a scalar quantization with an entropy model to further enhance the quantization efficiency. Our experimental results show that our proposed model significantly improves perceptual reconstruction quality at the expense of the waveform disparity in the time-domain, compared to our previous model.
KSP Keywords
Audio codec, Audio signal, Bit rate, Entropy model, Full-band, Multi-stage, Proposed model, Psychoacoustic Model, Reconstruction quality, Scalar Quantization, perceptual weighting