ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper MEMORYBRIDGE: Leveraging Cloud Resource Characteristics for Cost-Efficient Disk-based GNN Training via Two-Level Architecture
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Yoochan Kim, Weikuan Yu, Hong-Yeon Kim, Youngjae Kim
Issue Date
2025-02
Citation
International Symposium on Cluster, Cloud and Internet Computing (CCGrid) 2025, pp.1-10
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/CCGRID64434.2025.00063
Abstract
Graph Neural Networks (GNNs) are machine learning models that process graph-structured data by learning relationships between vertices and edges, as well as graph-level characteristics. Recently, with the emergence of large graph datasets on a TB scale, dataset sizes have exceeded the memory capacity of single machines. As a result, traditional methods that load all graph data into memory have become unusable, leading to the emergence of disk-based GNN training that uses storage as a memory extension. Recent research has focused on reducing disk I/O bottlenecks in disk-based GNNs. However, disk-based GNNs face new challenges in cloud environments due to two main characteristics. First, compared to node-local environments, the significantly slower cloud storage I/O speed becomes the main bottleneck of the entire training process. Second, pre-defined virtual machines prevent users from freely utilizing desired memory sizes, bandwidth, and the latest GPU technologies. These limitations have made existing disk-based GNN research unusable in cloud environments. To overcome this, we propose MEMORYBRIDGE, a system that cost-effectively accelerates GNN training in cloud environments through a novel two-level architecture that utilizes affordable GPU resources as training nodes and remote memory resources without GPUs as memory nodes, instead of using a single expensive GPU resource. This architecture consists of two key components: (i) a mathematical solver that recommends the most cost-effective resource combination, and (ii) a cloud-specialized GNN framework that implements graphaware fixed caching and batch pipelining optimization. The experimental results show that MEMORYBRIDGE achieved a speed improvement of up to 32.7x compared to existing GNN training frameworks and a cost efficiency of 9.9x compared to alternative resource configuration strategies, effectively handling the unique problems that arise from the combination of cloud environments and GNN training.
KSP Keywords
Alternative resource, Challenges in Cloud, Cloud resource, Cost Efficiency, Cost-efficient, Disk I/O, Graph data, Graph-structured data, I/O bottleneck, Key Components, Large graph