ETRI Knowledge Sharing Platform : MEMORYBRIDGE: Leveraging Cloud Resource Characteristics for Cost-Efficient Disk-based GNN Training via Two-Level Architecture

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Conference Paper MEMORYBRIDGE: Leveraging Cloud Resource Characteristics for Cost-Efficient Disk-based GNN Training via Two-Level Architecture

Cited 0 time in scopus

Authors: Yoochan Kim, Weikuan Yu, Hong-Yeon Kim, Youngjae Kim

Issue Date: 2025-02

Citation: International Symposium on Cluster, Cloud and Internet Computing (CCGrid) 2025, pp.1-10

Language: English

Type: Conference Paper

DOI: https://dx.doi.org/10.1109/CCGRID64434.2025.00063

Abstract: Graph Neural Networks (GNNs) are machine learning models that process graph-structured data by learning relationships between vertices and edges, as well as graph-level characteristics. Recently, with the emergence of large graph datasets on a TB scale, dataset sizes have exceeded the memory capacity of single machines. As a result, traditional methods that load all graph data into memory have become unusable, leading to the emergence of disk-based GNN training that uses storage as a memory extension. Recent research has focused on reducing disk I/O bottlenecks in disk-based GNNs. However, disk-based GNNs face new challenges in cloud environments due to two main characteristics. First, compared to node-local environments, the significantly slower cloud storage I/O speed becomes the main bottleneck of the entire training process. Second, pre-defined virtual machines prevent users from freely utilizing desired memory sizes, bandwidth, and the latest GPU technologies. These limitations have made existing disk-based GNN research unusable in cloud environments. To overcome this, we propose MEMORYBRIDGE, a system that cost-effectively accelerates GNN training in cloud environments through a novel two-level architecture that utilizes affordable GPU resources as training nodes and remote memory resources without GPUs as memory nodes, instead of using a single expensive GPU resource. This architecture consists of two key components: (i) a mathematical solver that recommends the most cost-effective resource combination, and (ii) a cloud-specialized GNN framework that implements graphaware fixed caching and batch pipelining optimization. The experimental results show that MEMORYBRIDGE achieved a speed improvement of up to 32.7x compared to existing GNN training frameworks and a cost efficiency of 9.9x compared to alternative resource configuration strategies, effectively handling the unique problems that arise from the combination of cloud environments and GNN training.

KSP Keywords: Alternative resource, Challenges in Cloud, Cloud resource, Cost Efficiency, Cost-efficient, Disk I/O, Graph data, Graph-structured data, I/O bottleneck, Key Components, Large graph

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI