ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Improved Alias-and-Separate Speech Coding Framework With Minimal Algorithmic Delay
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Eunkyun Lee, Seungkwon Beack, Jong Won Shin
Issue Date
2024-12
Citation
IEEE Journal of Selected Topics in Signal Processing, v.18, no.8, pp.1414-1426
ISSN
1932-4553
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/JSTSP.2024.3501681
Abstract
Alias-and-Separate (AaS) speech coding framework has shown the possibility to encode wideband (WB) speech with a narrowband (NB) speech codec and reconstruct it using speech separation. WB speech is first decimated incurring aliasing and then coded, transmitted, and decoded with a NB codec. The decoded signal is then separated into lower band and spectrally-flipped high band using a speech separation module, which are expanded, lowpass/highpass filtered, and added together to reconstruct the WB speech. The original AaS system, however, has algorithmic delay originated from the overlap-add operation for consecutive segments. This algorithmic delay can be reduced by omitting the overlap-add procedure, but the quality of the reconstructed speech is also degraded due to artifacts on the segment boundaries. In this work, we propose an improved AaS framework with minimum algorithmic delay. The decoded signal is first expanded by inserting zeros in-between samples before being processed by source separation module. As the expanded signal can be viewed as a summation of the frequency-shifted versions of the original signal, the decoded-and-expanded signal is then separated into the frequency-shifted signals, which are multiplied by complex exponentials and summed up to reconstruct the original signal. With carefully designed transposed convolution operation in the separation module, the proposed system requires minimal algorithmic delay while preventing discontinuity at the segment boundaries. Additionally, we propose to employ a generative vocoder to further improve the perceived quality and a modified multi-resolution short-time Fourier transform (MR-STFT) loss. Experimental results on the WB speech coding with a NB codec demonstrated that the proposed system outperformed the original AaS system and the existing WB speech codec in the subjective listening test. We have also shown that the proposed method can be applied when the decimation factor is not 2 in the experiment on the fullband speech coding with a WB codec.
KSP Keywords
In-between, Inserting zeros, Multi-resolution, Overlap-Add, Perceived quality, Reconstructed speech, Separation module, Short time Fourier transform, Speech Separation, Speech coding, algorithmic delay