ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector
Cited 5 time in scopus Download 83 time Share share facebook twitter linkedin kakaostory
저자
오유리, 박기영, 박전규
발행일
202006
출처
Applied Sciences, v.10 no.12, pp.1-21
ISSN
2076-3417
출판사
MDPI
DOI
https://dx.doi.org/10.3390/APP10124091
협약과제
20HS5200, 다중 화자간 대화 음성인식 기술개발, 박전규
초록
This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech recognition, including the proposed multichannel parallel acoustic score computation of a BLSTM acoustic model, the proposed deep neural network-based voice activity detector, and Viterbi decoding. The proposed acoustic score computation method estimates the acoustic scores of a context-sensitive-chunk BLSTM acoustic model for the batched speech features from concurrent clients, using the worker thread. The proposed deep neural network-based voice activity detector detects short pauses in the utterance to reduce response latency, while the user utters long sentences. From the experiments of Korean speech recognition, the number of concurrent clients is increased from 22 to 44 using the proposed acoustic score computation. When combined with the frame skipping method, the number is further increased up to 59 clients with a small accuracy degradation. Moreover, the average user-perceived latency is reduced from 11.71 s to 3.09-5.41 s by using the proposed deep neural network-based voice activity detector.
KSP 제안 키워드
Based Approach, Bidirectional Long Short-Term Memory, Computation method, Context-sensitive, Deep neural network(DNN), Frame skipping, High performance, Korean speech, Long-short term memory(LSTM), Low latency, Response latency
본 저작물은 크리에이티브 커먼즈 저작자 표시 (CC BY) 조건에 따라 이용할 수 있습니다.
저작자 표시 (CC BY)