ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Towards a B-format Ambisonic Room Impulse Response Generator Using Conditional Generative Adversarial Network
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Hualin Ren, Christian Ritz, Jiahong Zhao, Xiguang Zheng, Daeyoung Jang
Issue Date
2024-12
Citation
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA-ASC) 2024, pp.1-6
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/APSIPAASC63619.2025.10848654
Abstract
This paper proposes a B-format Ambisonic room impulse response (RIR) generator based on a conditional generative adversarial network (CGAN). The B-format RIR is a first-order Ambisonics (FOA) representation of the RIR used for creating spatial audio for virtual reality (VR) applications. The neural network produces FOA RIRs for specific VR rooms based on given receiver and source positions, and room dimension. The CGAN is trained using real-world FOA RIR recordings, with generative adversarial loss and other loss functions. These loss functions include the frequency-domain multi-resolution short-time Fourier transform (MRSTFT) loss function and various acoustic parameters, such as reverberation time (RT), early decay time (EDT), and direct-to-reverberant ratio (DRR). Adaptive weightings are applied to balance the importance of different loss functions. Objective evaluation metrics include mean squared error, MRSTFT, RT, EDT, DRR, DROQM for listening quality and direction of arrival accuracy. The results show that utilizing acoustic parameters as loss functions effectively controls these metrics, resulting in high-quality FOA RIRs.
KSP Keywords
As loss, Decay time, Direction of arrival(DoA), Early Decay, First-order, Frequency domain(FD), High-quality, Multi-resolution, Real-world, Reverberation time, Short time Fourier transform