ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper A Perceptual Neural Audio Coder with a Mean-Scale Hyperprior
Cited 4 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, Seungkwon Beack
Issue Date
2023-06
Citation
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, pp.1-5
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICASSP49357.2023.10096009
Abstract
This paper proposes an end-to-end neural audio coder based on a mean-scale hyperprior model together with a perceptual optimization using a psychoacoustic model (PAM)-based loss function. The proposed coder estimates the mean and scale hyperpriors using a sub-network after assuming that the probability distribution of latent samples is Gaussian. The main network is an autoencoder based on Resnet-type gated linear units (ResGLUs), each comprising a generalized divisive normalization (GDN) layer. We train both networks to optimize perceptual attributes estimated using a multi-timescale scheme to obtain high perceptual quality. Experimental results show that the proposed model accurately predicts the mean and scale hyperpriors. Also, it obtains consistently higher audio quality than the commercial MP3 audio coder at all bitrates.
KSP Keywords
Audio quality, End to End(E2E), Main network, Multi time scale, Perceptual Optimization, Perceptual Quality, Probability distribution, Proposed model, Psychoacoustic Model, Sub-networks, divisive normalization