ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Audio-Visual Overlapped Speech Detection for Spontaneous Distant Speech
Cited 2 time in scopus Download 98 time Share share facebook twitter linkedin kakaostory
Authors
Minyoung Kyoung, Hyungbae Jeon, Kiyoung Park
Issue Date
2023-03
Citation
IEEE Access, v.11, pp.27426-27432
ISSN
2169-3536
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/ACCESS.2023.3254529
Abstract
Although advances in deep learning have brought remarkable improvements to Overlapped Speech Detection (OSD), the performance in far-field environments is still limited owing to the lack of real-world overlapped speech and a low signal-to-noise ratio. In this paper, we present an end-to-end audiovisual OSD system based on decision fusion between audio and video modalities. Firstly, we propose a simple yet powerful audio data augmentation method for spontaneous distant speech data. Secondly, to maximize the effectiveness of the video modality, we design a video OSD system based on a cross-speaker attention module that explores the visual correlation between multiple speakers. Lastly, we present cross-modality attention module to make the final decision more accurate. Our experimental results demonstrate that our approach outperforms current state-of-the-art methods on a real-world distant speech dataset. Moreover, our approach can robustly detect overlapped speech when compared with its counterpart, which uses audio modality alone.
KSP Keywords
Audio and video, Audio data, Audio-visual, Augmentation method, Current state, Data Augmentation, Decision Fusion, End to End(E2E), Far-field, Field Environment, Real-world
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY