ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
Cited 1 time in scopus Download 25 time Share share facebook twitter linkedin kakaostory
저자
안신영, 임은지
발행일
202011
출처
IEEE Access, v.8, pp.207097-207111
ISSN
2169-3536
출판사
IEEE
DOI
https://dx.doi.org/10.1109/ACCESS.2020.3038112
협약과제
19HS5700, 대규모 딥러닝 고속 처리를 위한 HPC 시스템 개발, 최완
초록
Distributed processing using high-performance computing resources is essential for developers to train large-scale deep neural networks (DNNs). The major impediment to distributed DNN training is the communication bottleneck during the parameter exchange among the distributed DNN training workers. The communication bottleneck increases training time and decreases the utilization of the computational resources. Our previous study, SoftMemoryBox (SMB1) presented considerably superior performance compared to message passing interface (MPI) in the parameter communication of distributed DNN training. However, SMB1 had disadvantages such as the limited scalability of the distributed DNN training due to the restricted communication bandwidth from a single memory server, inability to provide a synchronization function for the shared memory buffer, and low portability/usability as a consequence of the kernel-level implementation. This paper proposes a scalable, shared memory buffer framework, called SoftMemoryBox II (SMB2), which overcomes the shortcomings of SMB1. With SMB2, distributed training processes can easily share virtually unified shared memory buffers composed of memory segments provided from remote memory servers and can exchange DNN parameters at high speed through the shared memory buffer. The scalable communication bandwidth of the SMB2 framework facilitates the reduction of DNN distributed training times compared to SMB1. According to intensive evaluation results, the communication bandwidth of the proposed SMB2 is 6.3 times greater than that of SMB1 when the SMB2 framework is scaled out to use eight memory servers. Moreover, the training time of SMB2-based asynchronous distributed training of five DNN models is up to 2.4 times faster than SMB1-based training.
KSP 제안 키워드
Computing resources, Deep neural network(DNN), Distributed training, High Performance Computing, High Speed, Remote Memory, Restricted communication, Shared Memory, Synchronization function, Training time, computational resources