ETRI Knowledge Sharing Platform : R3VQ: Redundancy-Reduced Residual Vector Quantization for Low-Bitrate Neural Speech Coding

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article R3VQ: Redundancy-Reduced Residual Vector Quantization for Low-Bitrate Neural Speech Coding

Cited 0 time in scopus

Abstract: Neural speech and audio codecs have demonstrated decent quality of the decoded audio at low bitrates. They consist of three parts, an encoder, a decoder, and a quantizer. Residual vector quantization (RVQ) or multi-stage vector quantization in which the residual signal from the previous stage is quantized in the next stage is employed in many neural speech codecs and has exhibited good performance while providing bitrate scalability. In this letter, we propose the redundancy-reduced residual vector quantization (R3VQ) which improves the RVQ by inserting a neural network called a refiner. The role of the refiner is to reduce the power of the residual signal to be quantized by enhancing the estimate of the original speech from the quantized signals in the previous stages. We also present a part-wise (PW) training scheme suitable for the training of the neural speech codec with the R3VQ. Experimental results showed that the proposed R3VQ trained with a PW training scheme outperformed the RVQ in both objective measures for speech quality and subjective MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test.

Keyword: residual vector quantization, soundstream, Speech coding, vector quantized variational autoencoder

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.