ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper End-to-end Stereo Audio Coding Using Deep Neural Networks
Cited 2 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Wootaek Lim, Inseon Jang, Seungkwon Beack, Jongmo Sung, Taejin Lee
Issue Date
2022-11
Citation
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC) 2022, pp.861-865
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.23919/APSIPAASC55919.2022.9980064
Abstract
Recently, deep neural-network-based audio data compression has been widely studied. Although most studies in this area have focused on mono channel speech coding, research on stereo coding is required for ultimate audio coding. In this paper, we consider in case of the stereo channel and present a method of end-to-end stereo audio coding, which exploits the concept of a mid/side (M/S) stereo coding scheme in the latent space of neural network. The goal of this study is to create an end-to-end neural network to find the optimal transformation for stereo coding. Therefore, the latent space conceptually corresponds to M and S, but better feature transformation for stereo coding is learned through the network. Through the end-to-end learning, the proposed method provides a more efficient bit allocation result than the discrete coding of each stereo channel into mono. According to the results of an objective test, the proposed method shows a superior performance in comparison to a discrete stereo model and conventional HE-AAC. A subjective evaluation shows that, for a bitrate of 64 kbps, the proposed model provides a significantly better sound quality than the discrete stereo coding model, as well as a performance comparable to that of a conventional HE-AAC.
KSP Keywords
Audio coding, Audio data, Deep neural network(DNN), End to End(E2E), Latent space, Mobile station(MS), Objective test, Optimal transformation, Proposed model, Speech coding, Stereo audio