ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Hooyoung Ahn, Seonyoung Kim, Yoomi Park, Woojong Han, Shinyoung Ahn, Tu Tran, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda
Issue Date
2024-12
Citation
International Conference on Big Data (Big Data) 2024, pp.1-6
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/BigData62323.2024.10825804
Abstract
In Artificial Intelligence (AI) and high-performance computing (HPC), growing data and model sizes require distributed processing across multiple nodes due to single-node limitations, increasing inter-node communication. To address these challenges, we propose a novel MPI allgather method leveraging CXL technology, which supports composable architectures and dynamic resource allocation in data centers and HPC systems. Notably, CXL 3.1 facilitates cache coherence among nodes. The proposed allgather method uses the CXL shared memory pool as a communication buffer, outperforming existing algorithms for two reasons: First, CXL provides lower latency than Ethernet and IB, and second, by using the CXL shared memory pool as a shared communication buffer across multiple nodes, it significantly reduces the number of communications. To the best of our knowledge, this work is the first to explore combining MPI collective communication with CXL technology to optimize MPI allgather. Our proposed allgather method significantly reduces communication latency compared to traditional allgather methods by up to 42.14x, with a minimum improvement of 2.91x, as measured using the OSU Micro-Benchmark (OMB), a standard MPI benchmarking suite.
KSP Keywords
Cache coherence, Communication Latency, Data center, Dynamic Resource Allocation(DRA), HPC system, High-performance computing(HPC), MPI collective communication, Multi-node, Shared Memory, artificial intelligence, computing systems