ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Uniformly Interpolated Balancing for Robust Prediction in TranslationQuality Estimation: A Case Study of English-Korean Translation
Cited 1 time in scopus Download 15 time Share share facebook twitter linkedin kakaostory
저자
김현, 나승훈
발행일
202001
출처
ACM Transactions on Asian and Low-Resource Language Information Processing, v.19 no.3, pp.1-27
ISSN
2375-4699
출판사
ACM
DOI
https://dx.doi.org/10.1145/3365916
협약과제
19HS3200, (엑소브레인-1세부) 휴먼 지식증강 서비스를 위한 지능진화형 WiseQA 플랫폼 기술 개발, 김현기
초록
There has been growing interest among researchers in quality estimation (QE), which attempts to automatically predict the quality of machine translation (MT) outputs. Most existing works on QE are based on supervised approaches using quality-annotated training data. However, QE training data quality scores readily become imbalanced or skewed: QE data are mostly composed of high translation quality sentence pairs but the data lack low translation quality sentence pairs. The use of imbalanced data with an induced quality estimator tends to produce biased translation quality scores with ?쐆igh?? translation quality scores assigned even to poorly translated sentences. To address the data imbalance, this article proposes a simple, efficient procedure called uniformly interpolated balancing to construct more balanced QE training data by inserting greater uniformness to training data. The proposed uniformly interpolated balancing procedure is based on the preparation of two different types of manually annotated QE data: (1) default skewed data and (2) near-uniform data. First, we obtain default skewed data in a naive manner without considering the imbalance by manually annotating qualities on MT outputs. Second, we obtain near-uniform data in a selective manner by manually annotating a subset only, which is selected from the automatically quality-estimated sentence pairs. Finally, we create uniformly interpolated balanced data by combining these two types of data, where one half originates from the default skewed data and the other half originates from the near-uniform data. We expect that uniformly interpolated balancing reflects the intrinsic skewness of the true quality distribution and manages the imbalance problem. Experimental results on an English-Korean quality estimation task show that the proposed uniformly interpolated balancing leads to robustness on both skewed and uniformly distributed quality test sets when compared to the test sets of other non-balanced datasets.
KSP 제안 키워드
Case studies, Data Quality, Data imbalance, Imbalance problem, Machine Translation(MT), Quality Scores, Quality estimation, Skewed data, Translation quality, Uniformly distributed, imbalanced data