ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Sentence-Chain Based Seq2seq Model for Corpus Expansion
Cited 14 time in scopus Download 314 time Share share facebook twitter linkedin kakaostory
Authors
Euisok Chung, Jeon Gue Park
Issue Date
2017-08
Citation
ETRI Journal, v.39, no.4, pp.455-466
ISSN
1225-6463
Publisher
한국전자통신연구원 (ETRI)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.4218/etrij.17.0116.0074
Abstract
This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of ngrams with superior performance for English text.
This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)
Type 4: