ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article OpenCL-Darknet: Implementation and Optimization of OpenCL-based Deep Learning Object Detection Framework
Cited 13 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Yongbon Koo, Sunghoon Kim, Young-guk Ha
Issue Date
2021-07
Citation
World Wide Web : Internet and Web Information Systems, v.24, no.4, pp.1299-1319
ISSN
1386-145X
Publisher
Springer
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1007/s11280-020-00778-y
Abstract
Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better performance than the classical feature-based algorithms. Darknet [31] is a deep learning object detection framework, which is well known for its fast speed and simple structure. Unfortunately, Darknet can only work with Nvidia CUDA [6] for accelerating its deep learning calculations. For this reason, users have only limited options of selecting appropriate graphic cards. Open computing language (OpenCL) [35], an open standard for cross-platform, parallel programming of heterogeneous systems, is available for the general hardware accelerators. However, many deep learning frameworks including Darknet have no support for OpenCL. In our previous paper, we presented OpenCL-Darknet [19], which transformed the CUDA-based Darknet into an open standard OpenCL backend. The original OpenCL-Darknet successfully showed its ability for the general graphics processing unit (GPU) hardware. However, it could not achieve competitive performance compared with the CUDA version, and it only supported a limited platform. In this study, we improved the performance of OpenCL-Darknet with several optimization techniques and added support for various architectures. We also evaluated OpenCL-Darknet not only in AMD R7 accelerated processing unit (APU) with OpenCL 2.0, but also in Nvidia GPU and ARM Mali embedded GPU with OpenCL 1.2 Profile. The evaluation using the standard object detection datasets showed that our advanced OpenCL-Darknet reduced the processing time by at most 50% on average for various deep learning object detection networks compared with our original implementation. We also showed that our OpenCL deep learning framework has competitiveness compared with the CUDA-based one.
KSP Keywords
Accelerated Processing Unit, Competitive performance, Cross-Platform, Deep learning framework, Detection Approaches, Detection Framework, Embedded GPU, Fast speed, Feature-based, Hardware accelerator, Heterogeneous System