ETRI Knowledge Sharing Platform : How to Aggregate Acoustic Delta Features for Deep Speaker Embeddings

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper How to Aggregate Acoustic Delta Features for Deep Speaker Embeddings

Cited 1 time in scopus

Citation: International Conference on Information and Communication Technology Convergence (ICTC) 2020, pp.1225-1229

Abstract: Speaker verification based on deep speaker embeddings (DSE) network outperformed traditional i- vectors systems. Afterward, to improve the performance, various researches have been conducting and data augmentation methods are one of them. In this paper, we focus on acoustic delta features augmentation and their aggregation methods for DSE networks, X-vectors and MobileVoxNet. For CNN-based MobileVoxNet, we re-design the architecture to aggregate delta features in deeper layer with squeeze and excitation (SE) module. Experimental results show that the proposed methods achieve performance improvement compared to not using delta features on the VoxCeleb1 test dataset. We also compare the number of computations and parameters of models to analyze efficiency of the proposed methods.

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.