ETRI Knowledge Sharing Platform : Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator

Cited 5 time in scopus

Authors: Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang

Citation: International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, pp.1-5

Abstract: In this paper, we improve the efficiency of the progressive multi-stage neural audio codec (PR-Codec) by utilizing perceptually motivated training criteria. Although our baseline PR-Codec successfully reconstructs full-band signals by progressively decoding the pre-defined subband signals, transparent quality can only be guaranteed in high bit-rates. To reduce bit-rates while maintaining perceptually transparent quality, we adopt a psychoacoustic model (PAM)-based loss and propose a perceptual weighting discriminator (PWD), which enables us to synthesize and discriminate audio signals in the perceptually motivated domain. We also introduce a scalar quantization with an entropy model to further enhance the quantization efficiency. Our experimental results show that our proposed model significantly improves perceptual reconstruction quality at the expense of the waveform disparity in the time-domain, compared to our previous model.

KSP Keywords: Audio codec, Audio signal, Bit rate, Entropy model, Full-band, Multi-stage, Proposed model, Psychoacoustic Model, Reconstruction quality, Scalar Quantization, perceptual weighting

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.