ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Optimizing Real-Time Object Detection in a Multi-Neural Processing Unit System
Cited 0 time in scopus Download 73 time Share share facebook twitter linkedin kakaostory
Authors
Sehyeon Oh, Yongin Kwon, Jemin Lee
Issue Date
2025-03
Citation
Sensors, v.25, no.5, pp.1-12
ISSN
1424-8220
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.3390/s25051376
Abstract
Real-time object detection demands high throughput and low latency, necessitating the use of hardware accelerators. NPU is specialized hardware designed to accelerate the calculation of deep learning models, providing better energy efficiency and parallel processing performance than existing CPUs or GPUs. In particular, it plays an important role in reducing latency and improving processing speed in applications that require real-time processing. In this paper, we construct a real-time object detection system based on YOLOv3, utilizing Neubla’s Antara NPU, and propose two approaches for performance optimization. First, we ensure the continuity of NPU inference by allowing the CPU to process data in advance through double buffering. Second, in a multi-NPU environment, we distribute tasks among NPUs through queue-based processing and analyze the performance limits using Amdahl’s law. Experimental results demonstrate that compared to a CPU-only environment, applying the NPU in single buffering improved throughput by 2.13 times, double buffering by 3.35 times, and in a multi-NPU environment by 4.81 times. Latency decreased by 1.6 times in single and double buffering, and by 1.18 times in the multi-NPU environment. The accuracy remained consistent, with 31.4 mAP on the CPU and 31.8 mAP on the NPU.
KSP Keywords
Detection Systems(IDS), Double buffering, Energy efficiency, Hardware accelerator, High throughput, Low latency, Neural processing, Parallel Processing, Performance Optimization, Performance limits, Process data
This work is distributed under the term of Creative Commons License (CCL)
(CC BY)
CC BY