ETRI Knowledge Sharing Platform : 병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article 병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구

Cited - time in scopus

Authors: 김은혜, 박주원

Issue Date: 2015-09

Citation: 산업경영시스템학회지, v.38, no.3, pp.56-63

ISSN: 2005-0461

Publisher: 한국산업경영시스템학회

Language: Korean

Type: Journal Article

DOI: https://dx.doi.org/10.11627/jkise.2015.38.3.56

Abstract: Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user’s estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

KSP Keywords: Clustered data, Clustering Analysis, High capacity, History data, Input features, Parallel applications, Runtime Prediction, Scientific workflows, Self-organizing Map, Workflow support, data patterns

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI