ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술대회 Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding
Cited 14 time in scopus Download 1 time Share share facebook twitter linkedin kakaostory
저자
KAI Zhen, 성종모, 이미숙, 백승권, 김민제
발행일
201909
출처
International Speech Communication Association (INTERSPEECH) 2019, pp.3396-3400
DOI
https://dx.doi.org/10.21437/Interspeech.2019-1816
협약과제
19HR2500, [통합과제] 초실감 테라미디어를 위한 AV부호화 및 LF미디어 원천기술 개발, 최진수
초록
Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. CMRL differs from other DNN-based speech codecs, in that rather than modeling speech compression problem in a single large neural network, it optimizes a series of less-complicated modules in a two-phase training scheme. The proposed method shows better objective performance than AMR-WB and the state-of-the-art DNN-based speech codec with a similar network architecture. As an end-to-end model, it takes raw PCM signals as an input, but is also compatible with linear predictive coding (LPC), showing better subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved by using only 0.9 million trainable parameters, a significantly less complex architecture than the other DNN-based codecs in the literature.
키워드
Deep neural network, Entropy coding, Residual learning, Speech coding
KSP 제안 키워드
AMR-WB, Compact Representation, Complex architecture, Compression problem, Cross-module, Data transmission, Deep neural network(DNN), End to End(E2E), Entropy Coding, Network Architecture, Perceptual Quality