ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Enhancement of waveform reconstruction for variational autoencoder-based neural audio synthesis with pitch information and automatic music transcription
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Seokjin Lee, Minhan Kim, Seunghyeon Shin, Daeho Lee, Inseon Jang, Wootaek Lim
Issue Date
2022-10
Citation
International Congress on Acoustics (ICA) 2022, pp.1-4
Language
English
Type
Conference Paper
Abstract
In recent audio signal processing techniques, analysis and synthesis models based on deep generative models have been applied for various reasons, such as audio signal compression. Particularly, some recently developed structures such as vector-quantized variational autoencoders can compress speech signals. However, extending these techniques to compress audio and music signals is challenging. Recently, a realtime audio variational autoencoder (RAVE) method for high-quality audio waveform synthesis was developed. The RAVE method synthesizes audio waveforms better than conventional methods; however, it still encounters certain challenges, such as missing low-pitched notes or generating irrelevant pitches. Therefore, to be applied to audio reconstruction problems such as audio signal compression, the reconstruction performance should be improved. Thus, we propose an enhanced structure of RAVE based on a conditional variational autoencoder (CVAE) structure and automatic music transcription model to improve the reconstruction performance of music signal waveforms.
KSP Keywords
Audio signal processing, Audio synthesis, Automatic music transcription(AMT), Conventional methods, Enhanced structure, High-quality, Reconstruction performance, Speech Signals, Waveform reconstruction, analysis and synthesis, audio signal compression