ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Uniformly Interpolated Balancing for Robust Prediction in Translation Quality Estimation
Cited 1 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Hyun Kim, Seung-Hoon Na
Issue Date
2020-01
Citation
ACM Transactions on Asian and Low-Resource Language Information Processing, v.19, no.3, pp.1-27
ISSN
2375-4699
Publisher
ACM
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1145/3365916
Abstract
There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with ?쐆igh?? translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets.
KSP Keywords
Data Quality, Data imbalance, Imbalance Problem, Imbalanced Data, Machine Translation(MT), Quality Scores, Quality estimation, Skewed data, Translation quality, Uniformly distributed, training data