ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술지 OpenCL-Darknet: Implementation and Optimization of OpenCL-based Deep Learning Object Detection Framework
Cited 10 time in scopus Download 22 time Share share facebook twitter linkedin kakaostory
구용본, 김성훈, 하영국
World Wide Web : Internet and Web Information Systems, v.24 no.4, pp.1299-1319
19PS1100, 대용량·고성능의 센서 융합 처리와 딥러닝의 실시간 처리가 가능한 서비스 로봇용 인공지능 컴퓨팅 플랫폼 기술 개발, 김성훈
Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better performance than the classical feature-based algorithms. Darknet [31] is a deep learning object detection framework, which is well known for its fast speed and simple structure. Unfortunately, Darknet can only work with Nvidia CUDA [6] for accelerating its deep learning calculations. For this reason, users have only limited options of selecting appropriate graphic cards. Open computing language (OpenCL) [35], an open standard for cross-platform, parallel programming of heterogeneous systems, is available for the general hardware accelerators. However, many deep learning frameworks including Darknet have no support for OpenCL. In our previous paper, we presented OpenCL-Darknet [19], which transformed the CUDA-based Darknet into an open standard OpenCL backend. The original OpenCL-Darknet successfully showed its ability for the general graphics processing unit (GPU) hardware. However, it could not achieve competitive performance compared with the CUDA version, and it only supported a limited platform. In this study, we improved the performance of OpenCL-Darknet with several optimization techniques and added support for various architectures. We also evaluated OpenCL-Darknet not only in AMD R7 accelerated processing unit (APU) with OpenCL 2.0, but also in Nvidia GPU and ARM Mali embedded GPU with OpenCL 1.2 Profile. The evaluation using the standard object detection datasets showed that our advanced OpenCL-Darknet reduced the processing time by at most 50% on average for various deep learning object detection networks compared with our original implementation. We also showed that our OpenCL deep learning framework has competitiveness compared with the CUDA-based one.
KSP 제안 키워드
Accelerated Processing Unit, Competitive performance, Cross-Platform, Deep learning framework, Detection Approaches, Detection Framework, Embedded GPU, Fast speed, Feature-based, Graphic Processing Unit(GPU), Hardware accelerator