ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Joon Byun, Seungmin Shin, Jongmo Sung, Seungkwon Beack, Youngcheol Park
Issue Date
2023-08
Citation
International Speech Communication Association (INTERSPEECH) 2023, pp.859-863
Publisher
ISCA
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.21437/Interspeech.2023-2305
Abstract
This paper proposes a method to improve the perceptual quality of an end-to-end neural speech coder using density models for bottleneck samples. Two parametric and non-parametric approaches are explored for modeling the bottleneck sample density. The first approach utilizes a sub-network to generate meanscale hyperpriors for bottleneck samples, while the second approach models the bottleneck samples using a separate subnetwork without any side information. The whole network, including the sub-network, is trained using PAM-based perceptual losses in different timescales to shape quantization noise below the masking threshold. The proposed method achieves a framedependent entropy model that enhances arithmetic coding efficiency while emphasizing perceptually relevant audio cues. Experimental results show that the proposed density model combined with PAM-based losses improves perceptual quality compared to conventional speech coders in both objective and subjective tests.
KSP Keywords
Audio cues, Coding efficiency, Deep neural network(DNN), End to End(E2E), Entropy model, Masking threshold, Perceptual Quality, Sample density, Speech coder, Sub-networks, Subjective test