ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper End-to-End Neural Audio Coding in the MDCT Domain
Cited 4 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang
Issue Date
2023-06
Citation
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, pp.1-5
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICASSP49357.2023.10096243
Abstract
Modern deep neural network (DNN)-based audio coding approaches utilize complicated non-linear functions (e.g., convolutional neural networks and non-linear activations), which leads to high complexity and memory usage. However, their decoded audio quality is still not much higher than that of signal processing-based legacy codecs. In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts the modified discrete cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of variables. It includes an efficient method to encode MDCT bins as well as a mechanism to adapt the quantization level of each bin. Our neural audio codec is trained in an end-to-end manner with the help of psychoacoustics-based perceptual loss, removing the burden of module-by-module fine-tuning. Experimental results show that our proposed model’s performance is comparable with the MP3 codec at around 64 and 48 kbps bit-rates for mono signals.
KSP Keywords
Audio codec, Audio coding, Audio quality, Bit rate, Convolution neural network(CNN), Deep neural network(DNN), Effective frequency, End to End(E2E), Fine-tuning, Frequency domain(FD), Modified Discrete Cosine Transform(MDCT)