ETRI Knowledge Sharing Platform : A Preliminary Study on Topical Model for Multi-domain Speech Recognition via Word Embedding Vector

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper A Preliminary Study on Topical Model for Multi-domain Speech Recognition via Word Embedding Vector

Cited 3 time in scopus

Citation: International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) 2019, pp.1-4

Abstract: In this paper, we suggest a basic topical model(TM) framework to adapt speech recognition system to multi-domain and prevent topical errors. This paper employs the cosine similarities between target and context words at a spoken utterance as the topical model parameters. The TM is applied to frames having a large number of candidate words at lattice network, and it adjusts the ranking of candidate words by adding it to total cost estimated from acoustic model(AM) and language model(LM). To cover multidomain, the word embedding was trained with 5.5 billion text corpus from multi-domain. As an acoustic model and a language model, DNN-HMM and N - gram were selected. 501 sentences (10,054 words) includes 35 topics were used as an evaluation data set. As a result, the best performances were obtained by our approach, and the performance of WERR was increased up to about 4% compared with N-gram based model. The WERR increased above 10% when the word errors were correctly detected. The results show this suggestion has a possibility of adapting a model to multi-domains without sub-topic models.

KSP Keywords: DNN-HMM, Data sets, Language Model, Model parameter, Multi-Domain, Preliminary study, Speech recognition system, Word Embedding, acoustic model, lattice network, n-Gram

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.