ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article R3VQ: Redundancy-Reduced Residual Vector Quantization for Low-Bitrate Neural Speech Coding
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Eunkyun Lee, Jongwook Chae, Sooyoung Park, Jong Won Shin
Issue Date
2026-01
Citation
IEEE Signal Processing Letters, v.33, pp.693-697
ISSN
1070-9908
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/LSP.2026.3655351
Abstract
Neural speech and audio codecs have demonstrated decent quality of the decoded audio at low bitrates. They consist of three parts, an encoder, a decoder, and a quantizer. Residual vector quantization (RVQ) or multi-stage vector quantization in which the residual signal from the previous stage is quantized in the next stage is employed in many neural speech codecs and has exhibited good performance while providing bitrate scalability. In this letter, we propose the redundancy-reduced residual vector quantization (R3VQ) which improves the RVQ by inserting a neural network called a refiner. The role of the refiner is to reduce the power of the residual signal to be quantized by enhancing the estimate of the original speech from the quantized signals in the previous stages. We also present a part-wise (PW) training scheme suitable for the training of the neural speech codec with the R3VQ. Experimental results showed that the proposed R3VQ trained with a PW training scheme outperformed the RVQ in both objective measures for speech quality and subjective MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test.
Keyword
residual vector quantization, soundstream, Speech coding, vector quantized variational autoencoder
KSP Keywords
Low Bitrates, Multi-stage vector quantization, Objective Measures, Residual signal, Residual vector quantization, Speech coding, Vector Quantization(VQ), neural network(NN), speech codec, speech quality, training scheme