ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술지 Sound Source Localization Based on GCC-PHAT With Diffuseness Mask in Noisy and Reverberant Environments
Cited 33 time in scopus Download 12 time Share share facebook twitter linkedin kakaostory
이란, 강민석, 김보현, 박강호, 이성규, 박형민
IEEE Access, v.8, pp.7373-7382
20HS1300, 신체기능의 이상이나 저하를 극복하기 위한 휴먼 청각 및 근력 증강 원천 기술 개발, 신형철
Although sound source localization is a desirable technique in many communication systems and intelligence applications, the distortion caused by diffuse noise or reverberation makes the time delay estimation (TDE) between signals acquired by a pair of microphones a complicated and challenging problem. In this paper, we describe a method that can efficiently achieve sound source localization in noisy and reverberant environments. This method is based on the generalized cross-correlation (GCC) function with phase transform (PHAT) weights (GCC-PHAT) to achieve robustness against reverberation. In addition, to estimate the time delay robust to diffuse components and to further improve the robustness of the GCC-PHAT against reverberation, time-frequency(t-f) components of observations directly emitted by a point source are chosen by 'inversed' diffuseness. The diffuseness that can be estimated from the coherent-to-diffuse power ratio (CDR) based on spatial coherence between two microphones represents the contribution of diffuse components on a scale of zero to one with direct sounds from a source modeled to be fully coherent. In particular, the 'inversed' diffuseness is binarized with a very rigorous threshold to select highly reliable components for accurate TDE even in noisy and reverberant environments. Experimental results for both simulated and real-recorded data consistently demonstrated the robustness of the presented method against diffuse noise and reverberation.
KSP 제안 키워드
Communication system, GCC-PHAT, Generalized Cross-correlation, Phase transform, Spatial coherence, diffuse noise, highly reliable, point source, power ratio, reverberant environments, sound source localization