ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 A Preliminary Study on Topical Model for Multi-domain Speech Recognition via Word Embedding Vector
Cited 2 time in scopus Download 3 time Share share facebook twitter linkedin kakaostory
문지혜, 윤승, 이담허, 김상훈
International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) 2019, pp.1-4
19ZS1100, 자율성장형 AI 핵심원천기술 연구, 송화전
In this paper, we suggest a basic topical model(TM) framework to adapt speech recognition system to multi-domain and prevent topical errors. This paper employs the cosine similarities between target and context words at a spoken utterance as the topical model parameters. The TM is applied to frames having a large number of candidate words at lattice network, and it adjusts the ranking of candidate words by adding it to total cost estimated from acoustic model(AM) and language model(LM). To cover multidomain, the word embedding was trained with 5.5 billion text corpus from multi-domain. As an acoustic model and a language model, DNN-HMM and N - gram were selected. 501 sentences (10,054 words) includes 35 topics were used as an evaluation data set. As a result, the best performances were obtained by our approach, and the performance of WERR was increased up to about 4% compared with N-gram based model. The WERR increased above 10% when the word errors were correctly detected. The results show this suggestion has a possibility of adapting a model to multi-domains without sub-topic models.
KSP 제안 키워드
DNN-HMM, Data sets, Language model, Model parameter, Multi-Domain, Preliminary study, Speech recognition system, Text Corpus, Word Embedding, acoustic model, lattice network