ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 OpenCL-Darknet: Implementation and Optimization of OpenCL-based Deep Learning Object Detection Framework
Cited 5 time in scopus Download 15 time Share share facebook twitter linkedin kakaostory
저자
구용본, 김성훈, 하영국
발행일
202107
출처
World Wide Web : Internet and Web Information Systems, v.24 no.4, pp.1299-1319
ISSN
1386-145X
출판사
Springer
DOI
https://dx.doi.org/10.1007/s11280-020-00778-y
협약과제
19PS1100, 대용량·고성능의 센서 융합 처리와 딥러닝의 실시간 처리가 가능한 서비스 로봇용 인공지능 컴퓨팅 플랫폼 기술 개발, 김성훈
초록
Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better performance than the classical feature-based algorithms. Darknet [31] is a deep learning object detection framework, which is well known for its fast speed and simple structure. Unfortunately, Darknet can only work with Nvidia CUDA [6] for accelerating its deep learning calculations. For this reason, users have only limited options of selecting appropriate graphic cards. Open computing language (OpenCL) [35], an open standard for cross-platform, parallel programming of heterogeneous systems, is available for the general hardware accelerators. However, many deep learning frameworks including Darknet have no support for OpenCL. In our previous paper, we presented OpenCL-Darknet [19], which transformed the CUDA-based Darknet into an open standard OpenCL backend. The original OpenCL-Darknet successfully showed its ability for the general graphics processing unit (GPU) hardware. However, it could not achieve competitive performance compared with the CUDA version, and it only supported a limited platform. In this study, we improved the performance of OpenCL-Darknet with several optimization techniques and added support for various architectures. We also evaluated OpenCL-Darknet not only in AMD R7 accelerated processing unit (APU) with OpenCL 2.0, but also in Nvidia GPU and ARM Mali embedded GPU with OpenCL 1.2 Profile. The evaluation using the standard object detection datasets showed that our advanced OpenCL-Darknet reduced the processing time by at most 50% on average for various deep learning object detection networks compared with our original implementation. We also showed that our OpenCL deep learning framework has competitiveness compared with the CUDA-based one.
키워드
deep learning, image processing, object detection, OpenCL, parallel programming
KSP 제안 키워드
Accelerated Processing Unit, Competitive performance, Cross-Platform, Deep learning framework, Detection Approaches, Detection Framework, Embedded GPU, Fast speed, Feature-based, Graphic Processing Unit(GPU), Hardware accelerator