ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Parallel Enhancement and Bandwidth Extension of Coded Speech
Cited 0 time in scopus Download 144 time Share share facebook twitter linkedin kakaostory
Authors
Jongwook Chae, Eunkyun Lee, Sooyoung Park, Jong Won Shin
Issue Date
2026-02
Citation
Applied Sciences (Switzerland), v.16, no.3, pp.1-14
ISSN
2076-3417
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.3390/app16031439
Abstract
An important use case of speech bandwidth extension (BWE) is generating high-frequency components from band-limited speech processed by a speech codec. Recent works on BWE have demonstrated remarkable capabilities in generating high-quality, high-band components using deep learning techniques. Among them, Streaming SEANet (StrmSEANet) has also been shown to be effective for BWE with reduced delay and computational complexity, making it suitable for real-time speech processing. However, the effect of the coding artifact in the lower band of the input signal has not been sufficiently considered in many deep learning-based BWE methods. In this work, we propose Parallel Enhancement and Bandwidth Extension of coded speech (PEBE), where two lightweight networks, referred to as Compact Streaming SEANet (CompSEANet), for coded speech enhancement (CSE) and BWE are configured in parallel. The CSE and BWE models are separately trained with the task-specific training settings, thereby effectively improving the reconstruction quality of the band-limited speech signals degraded by coding artifacts. Experimental results demonstrate that the proposed PEBE significantly outperforms the baseline AP-BWE, StrmSEANet, and standalone CompSEANet in reconstructing wideband (WB) and fullband speech from Opus-coded narrowband and WB signals. The proposed method achieves the highest scores in the subjective MUSHRA test while providing the fastest inference among all compared methods, with real-time factors (RTF) of 33.95× and 18.38× measured on a Samsung SM-F711 mobile device under single-thread execution.
Keyword
coded speech enhancement, speech bandwidth extension, speech coding, Streaming SEANet
KSP Keywords
Computational complexity, High frequency(HF), High-quality, Learning-based, Mobile devices, Real-time, Reconstruction quality, Reduced delay, Speech Signals, Speech coding, Speech processing
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY