등록
명시적 어휘 정렬 정보를 부가한 부분 어휘 토큰 단위 기반의 신경망 자동번역 시스템 및 방법
- 발명자
-
신종훈, 김영길
- 출원번호
-
15944939 (2018.04.04)
- 공개번호
-
20190129947 (2019.05.02)
- 등록번호
- 10635753 (2020.04.28)
- 출원국
- 미국
- 협약과제
-
17HS1700, 지식증강형 실시간 동시통역 원천기술 개발,
김영길
- 초록
- The present invention provides a method of generating training data to which explicit word-alignment information is added without impairing sub-word tokens, and a neural machine translation method and apparatus including the method. The method of generating training data includes the steps of: (1) separating basic word boundaries through morphological analysis or named entity recognition of a sentence of a bilingual corpus used for learning; (2) extracting explicit word-alignment information from the sentence of the bilingual corpus used for learning; (3) further dividing the word boundaries separated in step (1) into sub-word tokens; (4) generating new source language training data by using an output from the step (1) and an output from the step (3); and (5) generating new target language training data by using the explicit word-alignment information generated in the step (2 ) and the target language outputs from the steps (1) and (3).
- 패밀리
-