ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술지 Sound Source Localization Based on GCC-PHAT With Diffuseness Mask in Noisy and Reverberant Environments
Cited 7 time in scopus Download 4 time Share share facebook twitter linkedin kakaostory
이란, 강민석, 김보현, 박강호, 이성규, 박형민
IEEE Access, v.8, pp.7373-7382
20HS1300, 신체기능의 이상이나 저하를 극복하기 위한 휴먼 청각 및 근력 증강 원천 기술 개발, 신형철
Although sound source localization is a desirable technique in many communication systems and intelligence applications, the distortion caused by diffuse noise or reverberation makes the time delay estimation (TDE) between signals acquired by a pair of microphones a complicated and challenging problem. In this paper, we describe a method that can efficiently achieve sound source localization in noisy and reverberant environments. This method is based on the generalized cross-correlation (GCC) function with phase transform (PHAT) weights (GCC-PHAT) to achieve robustness against reverberation. In addition, to estimate the time delay robust to diffuse components and to further improve the robustness of the GCC-PHAT against reverberation, time-frequency(t-f) components of observations directly emitted by a point source are chosen by 'inversed' diffuseness. The diffuseness that can be estimated from the coherent-to-diffuse power ratio (CDR) based on spatial coherence between two microphones represents the contribution of diffuse components on a scale of zero to one with direct sounds from a source modeled to be fully coherent. In particular, the 'inversed' diffuseness is binarized with a very rigorous threshold to select highly reliable components for accurate TDE even in noisy and reverberant environments. Experimental results for both simulated and real-recorded data consistently demonstrated the robustness of the presented method against diffuse noise and reverberation.
Diffuseness mask, GCC-PHAT, reverberation, sound source localization
KSP 제안 키워드
Communication system, GCC-PHAT, Generalized Cross-correlation, Phase transform, Spatial coherence, diffuse noise, highly reliable, point source, power ratio, reverberant environments, sound source localization