ETRI Knowledge Sharing Platform : Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training

Cited 12 time in scopus

Citation: International Conference on Information and Communication Technology Convergence (ICTC) 2018, pp.1239-1242

Abstract: In distributed deep neural network training, since the communication overhead caused by parameter sharing across multiple deep learning workers can be a performance bottleneck, performing efficient parameter sharing is a crucial challenge in distributed deep learning framework. In this paper, we propose a distributed deep learning framework called TFSM, uses remote shared memory for efficient parameter sharing to accelerate distributed DNN training. TFSM is based on the remote shared memory framework which provides shared memory accessible by multi-machines at high-speed. TFSM provides a new asynchronous parameter update method based on the remote shared memory. We confirmed that the TFSM improves the training time of DNN compared to TensorFlow by training well-known deep learning models using 8 GPU workers.

KSP Keywords: Communication overhead, Deep learning framework, Deep neural network(DNN), High Speed, Neural network training, Shared Memory, Training time, deep learning(DL), deep learning models, parameter sharing, performance bottleneck

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.