ETRI Knowledge Sharing Platform : OpenCL-Darknet: Implementation and Optimization of OpenCL-based Deep Learning Object Detection Framework

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article OpenCL-Darknet: Implementation and Optimization of OpenCL-based Deep Learning Object Detection Framework

Cited 13 time in scopus

Authors: Yongbon Koo, Sunghoon Kim, Young-guk Ha

Issue Date: 2021-07

Citation: World Wide Web : Internet and Web Information Systems, v.24, no.4, pp.1299-1319

ISSN: 1386-145X

Publisher: Springer

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.1007/s11280-020-00778-y

Abstract: Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better performance than the classical feature-based algorithms. Darknet [31] is a deep learning object detection framework, which is well known for its fast speed and simple structure. Unfortunately, Darknet can only work with Nvidia CUDA [6] for accelerating its deep learning calculations. For this reason, users have only limited options of selecting appropriate graphic cards. Open computing language (OpenCL) [35], an open standard for cross-platform, parallel programming of heterogeneous systems, is available for the general hardware accelerators. However, many deep learning frameworks including Darknet have no support for OpenCL. In our previous paper, we presented OpenCL-Darknet [19], which transformed the CUDA-based Darknet into an open standard OpenCL backend. The original OpenCL-Darknet successfully showed its ability for the general graphics processing unit (GPU) hardware. However, it could not achieve competitive performance compared with the CUDA version, and it only supported a limited platform. In this study, we improved the performance of OpenCL-Darknet with several optimization techniques and added support for various architectures. We also evaluated OpenCL-Darknet not only in AMD R7 accelerated processing unit (APU) with OpenCL 2.0, but also in Nvidia GPU and ARM Mali embedded GPU with OpenCL 1.2 Profile. The evaluation using the standard object detection datasets showed that our advanced OpenCL-Darknet reduced the processing time by at most 50% on average for various deep learning object detection networks compared with our original implementation. We also showed that our OpenCL deep learning framework has competitiveness compared with the CUDA-based one.

KSP Keywords: Accelerated Processing Unit, Competitive performance, Cross-Platform, Deep learning framework, Detection Approaches, Detection Framework, Embedded GPU, Fast speed, Feature-based, Hardware accelerator, Heterogeneous System

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI