ETRI Knowledge Sharing Platform : Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Perceptual Improvement of Deep Neural Network (DNN) Speech Coder Using Parametric and Non-parametric Density Models

Cited 4 time in scopus

Authors: Joon Byun, Seungmin Shin, Jongmo Sung, Seungkwon Beack, Youngcheol Park

Citation: International Speech Communication Association (INTERSPEECH) 2023, pp.859-863

Abstract: This paper proposes a method to improve the perceptual quality of an end-to-end neural speech coder using density models for bottleneck samples. Two parametric and non-parametric approaches are explored for modeling the bottleneck sample density. The first approach utilizes a sub-network to generate meanscale hyperpriors for bottleneck samples, while the second approach models the bottleneck samples using a separate subnetwork without any side information. The whole network, including the sub-network, is trained using PAM-based perceptual losses in different timescales to shape quantization noise below the masking threshold. The proposed method achieves a framedependent entropy model that enhances arithmetic coding efficiency while emphasizing perceptually relevant audio cues. Experimental results show that the proposed density model combined with PAM-based losses improves perceptual quality compared to conventional speech coders in both objective and subjective tests.

KSP Keywords: Audio cues, Coding efficiency, Deep neural network(DNN), End to End(E2E), Entropy model, Masking threshold, Perceptual Quality, Sample density, Speech coder, Sub-networks, Subjective test

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.