ETRI Knowledge Sharing Platform : A Deep-Learning based Native-Language Classification by using a Latent Semantic Analysis for the NLI Shared Task 2017

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper A Deep-Learning based Native-Language Classification by using a Latent Semantic Analysis for the NLI Shared Task 2017

Cited - time in scopus

Authors: Yoo Rhee Oh, Hyung-Bae Jeon, Hwa Jeon Song, Yun-Kyung Lee, Jeon-Gue Park, Yun-Keun Lee

Citation: Workshop on Innovative Use of NLP for Building Educational Applications 2017, pp.413-422

Abstract: This paper proposes a deep-learning based native-language identification (NLI) using a latent semantic analysis (LSA) as a participant (ETRI-SLP) of the NLI Shared Task 2017 (Malmasi et al., 2017) where the NLI Shared Task 2017 aims to detect the native language of an essay or speech response of a standardized assessment of English proficiency for academic purposes. To this end, we use the six unit forms of a text data such as character 4/5/6-grams and word 1/2/3-grams. For each unit form of text data, we convert it into a count-based vector, extract a 2000-rank LSA feature, and perform a linear discriminant analysis (LDA) based dimension reduction. From the count-based vector or the LSA-LDA feature, we also obtain the output prediction values of a support vector machine (SVM) based classifier, the output prediction values of a deep neural network (DNN) based classifier, and the bottleneck values of a DNN based classifier. In order to incorporate the various kinds of text-based features and a speech-based i-vector feature, we design two DNN based ensemble classifiers for late fusion and early fusion, respectively. From the NLI experiments, the F1 (macro) scores are obtained as 0.8601, 0.8664, and 0.9220 for the essay track, the speech track, and the fusion track, respectively. The proposed method has comparable performance to the top-ranked teams for the speech and fusion tracks, although it has slightly lower performance for the essay track.

KSP Keywords: Deep neural network(DNN), Dimension Reduction, Early Fusion, English proficiency, I-vector, Latent semantic analysis, Native language, Shared task, Support VectorMachine(SVM), count-based, deep learning(DL)

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.