ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Deep Learning Framework using Scalable Shared Memory Buffer Framework
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Eun-Ji Lim, Shin-Young Ahn
Issue Date
2021-02
Citation
International Conference on Electronics, Information and Communication (ICEIC) 2021, pp.542-544
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICEIC51217.2021.9369801
Abstract
Communication overhead among the distributed training workers can be a performance bottleneck in large-scale deep neural network (DNN) training. This overhead prevents the rapid development of high-performance DNNs, so distributed deep learning frameworks should provide efficient parameter sharing techniques. In previous work, we proposed TFSM that is a distributed deep learning framework based on the remote shared memory framework (SMB). In this paper, we propose an upgraded TFSM based on SMB2. SMB2 is a scalable shared memory buffer framework which provides scalability of memory server, lock function, and user-level implementation. SMB2-based TFSM can extends the parameter I/O bandwidth and shared memory capacity. It also uses modified asynchronous parameter update method using the lock function of SMB2. We verified that SMB2-based TFSM outperforms the previous TFSM and TensorFlow by measuring the training throughput of large-scale DNNs during distributed training.
KSP Keywords
Communication overhead, Deep learning framework, Deep neural network(DNN), Distributed training, High performance, I/O bandwidth, Rapid development, Shared Memory, deep learning(DL), large-scale, memory capacity