ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization
Cited 15 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
Kai Zhen, 이미숙, 성종모, 백승권, 김민제
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020, pp.361-365
19HR2500, [통합과제] 초실감 테라미디어를 위한 AV부호화 및 LF미디어 원천기술 개발, 최진수
Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.
KSP 제안 키워드
AMR-WB, Computational capacity, Digital signal processing methods, Domain specific, Scale-up, Wide range, generative models, model complexity, neural network model, speech codec