ETRI Knowledge Sharing Platform : Deep Learning Framework using Scalable Shared Memory Buffer Framework

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Deep Learning Framework using Scalable Shared Memory Buffer Framework

Cited 0 time in scopus

Citation: International Conference on Electronics, Information and Communication (ICEIC) 2021, pp.542-544

Abstract: Communication overhead among the distributed training workers can be a performance bottleneck in large-scale deep neural network (DNN) training. This overhead prevents the rapid development of high-performance DNNs, so distributed deep learning frameworks should provide efficient parameter sharing techniques. In previous work, we proposed TFSM that is a distributed deep learning framework based on the remote shared memory framework (SMB). In this paper, we propose an upgraded TFSM based on SMB2. SMB2 is a scalable shared memory buffer framework which provides scalability of memory server, lock function, and user-level implementation. SMB2-based TFSM can extends the parameter I/O bandwidth and shared memory capacity. It also uses modified asynchronous parameter update method using the lock function of SMB2. We verified that SMB2-based TFSM outperforms the previous TFSM and TensorFlow by measuring the training throughput of large-scale DNNs during distributed training.

KSP Keywords: Communication overhead, Deep learning framework, Deep neural network(DNN), Distributed training, High performance, I/O bandwidth, Rapid development, Shared Memory, deep learning(DL), large-scale, memory capacity

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.