ETRI Knowledge Sharing Platform : Parallel Enhancement and Bandwidth Extension of Coded Speech

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Parallel Enhancement and Bandwidth Extension of Coded Speech

Cited 0 time in scopus

Download 558 time Share share

Authors: Jongwook Chae, Eunkyun Lee, Sooyoung Park, Jong Won Shin

Issue Date: 2026-02

Citation: Applied Sciences (Switzerland), v.16, no.3, pp.1-14

ISSN: 2076-3417

Publisher: Multidisciplinary Digital Publishing Institute (MDPI)

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.3390/app16031439

Abstract: An important use case of speech bandwidth extension (BWE) is generating high-frequency components from band-limited speech processed by a speech codec. Recent works on BWE have demonstrated remarkable capabilities in generating high-quality, high-band components using deep learning techniques. Among them, Streaming SEANet (StrmSEANet) has also been shown to be effective for BWE with reduced delay and computational complexity, making it suitable for real-time speech processing. However, the effect of the coding artifact in the lower band of the input signal has not been sufficiently considered in many deep learning-based BWE methods. In this work, we propose Parallel Enhancement and Bandwidth Extension of coded speech (PEBE), where two lightweight networks, referred to as Compact Streaming SEANet (CompSEANet), for coded speech enhancement (CSE) and BWE are configured in parallel. The CSE and BWE models are separately trained with the task-specific training settings, thereby effectively improving the reconstruction quality of the band-limited speech signals degraded by coding artifacts. Experimental results demonstrate that the proposed PEBE significantly outperforms the baseline AP-BWE, StrmSEANet, and standalone CompSEANet in reconstructing wideband (WB) and fullband speech from Opus-coded narrowband and WB signals. The proposed method achieves the highest scores in the subjective MUSHRA test while providing the fastest inference among all compared methods, with real-time factors (RTF) of 33.95× and 18.38× measured on a Samsung SM-F711 mobile device under single-thread execution.

Keyword: coded speech enhancement, speech bandwidth extension, speech coding, Streaming SEANet

KSP Keywords: Computational complexity, High frequency(HF), High-quality, Input signal, Learning-based, Mobile devices, Reconstruction quality, Reduced delay, Speech Signals, Speech coding, Speech processing

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI