ETRI Knowledge Sharing Platform : Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation

Cited 1 time in scopus

Authors: Hyun Kim, Seung-Hoon Na

Issue Date: 2020-01

Citation: ACM Transactions on Asian and Low-Resource Language Information Processing, v.19, no.3, pp.1-27

ISSN: 2375-4699

Publisher: ACM

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.1145/3365916

Abstract: There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with ?쐆igh?? translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets.

KSP Keywords: Data Quality, Data imbalance, Imbalance Problem, Imbalanced Data, Machine Translation(MT), Quality Scores, Quality estimation, Skewed data, Translation quality, Uniformly distributed, training data

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI