ETRI Knowledge Sharing Platform : Perceptual Neural Audio Coding With Modified Discrete Cosine Transform

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Perceptual Neural Audio Coding With Modified Discrete Cosine Transform

Cited 2 time in scopus

Authors: Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang

Issue Date: 2025-02

Citation: IEEE Journal of Selected Topics in Signal Processing, v.18, no.8, pp.1490-1505

ISSN: 1932-4553

Publisher: Institute of Electrical and Electronics Engineers

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.1109/JSTSP.2024.3491576

Abstract: Despite efforts to leverage the modeling power of deep neural networks (DNNs) in audio coding, effectively deploying them in real-world applications is still problematic due to their high computational cost and the restricted range of target signals or achievable bit-rates. In this paper, we propose an alternative approach for integrating DNNs into a perceptual audio coder that allows for the optimization of the whole system in a data-driven, end-to-end manner. The key idea of the proposed method is to make DNNs control the quantization noise in the classic transform coding framework, specifically based on the modified discrete cosine transform (MDCT). The proposal includes a new DNN-based mechanism for adaptively adjusting the quantization step sizes of frequency bands targeting an arbitrary bit-rate, eventually acting as a data-driven differentiable psychoacoustic model. The side information regarding the adaptive quantization is also encoded and decoded by DNNs via learned representation. During training, the perceptual distortion is evaluated by a perceptual quality estimation model trained on actual human ratings so that the proposed audio codec can effectively allocate bits considering their effect on the perceptual quality. Through comparisons with legacy audio codecs (MP3 and AAC) and a neural audio codec (EnCodec), we show that our method can achieve further coding gains over the legacy codecs with a substantially lower computational load on the decoder compared to other neural audio codecs.

KSP Keywords: Audio codec, Audio coding, Bit rate, Coding Gain, Data-Driven, Deep neural network(DNN), End to End(E2E), Estimation model, Modified Discrete Cosine Transform(MDCT), Perceptual Quality, Psychoacoustic Model

ETRI

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.

제1유형

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI