ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Multihybrid Job Scheduling for Fault-Tolerant Distributed Computing in Policy-Constrained Resource Networks
Cited 17 time in scopus Download 10 time Share share facebook twitter linkedin kakaostory
Authors
Yong-Hyuk Moon, Chan-Hyun Youn
Issue Date
2015-05
Citation
Computer Networks : The International Journal of Telecommunications Networking, v.82, pp.81-95
ISSN
1389-1286
Publisher
Elsevier
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1016/j.comnet.2015.02.030
Project Code
14MS2200, Development of the security technology for MTM-based mobile devices and next generation wireless LAN, Cho Hyun Sook
Abstract
Unpredictable fluctuations in resource availability often lead to rescheduling decisions that sacrifice a success rate of job completion in batch job scheduling. To overcome this limitation, we consider the problem of assigning a set of sequential batch jobs with demands to a set of resources with constraints such as heterogeneous rescheduling policies and capabilities. The ultimate goal is to find an optimal allocation such that performance benefits in terms of makespan and utilization are maximized according to the principle of Pareto optimality, while maintaining the job failure rate close to an acceptably low bound. To this end, we formulate a multihybrid policy decision problem (MPDP) on the primary-backup fault tolerance model and theoretically show its NP-completeness. The main contribution is to prove that our multihybrid job scheduling (MJS) scheme confidently guarantees the fault-tolerant performance by adaptively combining jobs and resources with different rescheduling policies in MPDP. Furthermore, we demonstrate that the proposed MJS scheme outperforms the five rescheduling heuristics in solution quality, searching adaptability and time efficiency by conducting a set of extensive simulations under various scheduling conditions.
KSP Keywords
Batch job, Decision problem, Failure Rate, Fault tolerance, Fault-tolerant, Job failure, NP-completeness, Optimal Allocation, Pareto Optimality, Resource availability, Solution quality