ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Distributed Data Analysis Workflow System across Multiple Data Hubs
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Siwoon Son, Seong-Hwan Kim, Heesun Won
Issue Date
2024-12
Citation
International Conference on Big Data (Big Data) 2024, pp.368-373
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/BigData62323.2024.10825680
Abstract
In multisite data hub environments, data transfer costs and resource constraints challenge efficient data analysis workflow execution. Traditional analysis methods across multi-hubs often reduce performance and increase costs. To address these challenges, we introduce CROSS (Collaborative Resource-Oriented Scheduling System), which optimizes distributed workflows by utilizing data and resources across multiple hubs. CROSS enables collaboration between hubs, minimizes data transfer, and maximizes resource use through an efficient scheduling algorithm that considers data locality, resources, and workflow structure. Experiments with four scientific workflows show CROSS reduces makespan by up to 33.5% and improves CPU and memory efficiency by 1.58x and 1.59x, respectively, making it effective for multisite workflows.
KSP Keywords
Analysis method, Analysis workflow, Data locality, Data transfer, Distributed data analysis, Distributed workflows, Memory Efficiency, Multiple data, Resource use, Resource-oriented, Scheduling algorithm