ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Perceptual Neural Audio Coding With Modified Discrete Cosine Transform
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang
Issue Date
2025-02
Citation
IEEE Journal of Selected Topics in Signal Processing, v.18, no.8, pp.1490-1505
ISSN
1932-4553
Publisher
Institute of Electrical and Electronics Engineers
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/JSTSP.2024.3491576
Abstract
Despite efforts to leverage the modeling power of deep neural networks (DNNs) in audio coding, effectively deploying them in real-world applications is still problematic due to their high computational cost and the restricted range of target signals or achievable bit-rates. In this paper, we propose an alternative approach for integrating DNNs into a perceptual audio coder that allows for the optimization of the whole system in a data-driven, end-to-end manner. The key idea of the proposed method is to make DNNs control the quantization noise in the classic transform coding framework, specifically based on the modified discrete cosine transform (MDCT). The proposal includes a new DNN-based mechanism for adaptively adjusting the quantization step sizes of frequency bands targeting an arbitrary bit-rate, eventually acting as a data-driven differentiable psychoacoustic model. The side information regarding the adaptive quantization is also encoded and decoded by DNNs via learned representation. During training, the perceptual distortion is evaluated by a perceptual quality estimation model trained on actual human ratings so that the proposed audio codec can effectively allocate bits considering their effect on the perceptual quality. Through comparisons with legacy audio codecs (MP3 and AAC) and a neural audio codec (EnCodec), we show that our method can achieve further coding gains over the legacy codecs with a substantially lower computational load on the decoder compared to other neural audio codecs.
KSP Keywords
Audio codec, Audio coding, Bit rate, Coding Gain, Data-Driven, Deep neural network(DNN), End to End(E2E), Estimation model, Modified Discrete Cosine Transform(MDCT), Perceptual Quality, Psychoacoustic Model