ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article An Efficient Architecture of In-Loop Filters for Multicore Scalable HEVC Hardware Decoders
Cited 8 time in scopus Download 11 time Share share facebook twitter linkedin kakaostory
Authors
HyunMi Kim, JeongGil Ko, Seongmo Park
Issue Date
2018-04
Citation
IEEE Transactions on Multimedia, v.20, no.4, pp.810-824
ISSN
1520-9210
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/TMM.2017.2759506
Project Code
17HB2500, Intelligence Many-Core Processor and SW based on Low-Power Hypervisor, Kwon Young-Su
Abstract
This paper proposes an efficient architecture of HEVC in-loop filters (ILFs) with the target of providing effective multicore utilization for ultra-high definition video applications. While HEVC allows for a high level of parallelization, the issue of data dependencies at the ILF leads to inefficient parallel processing performance. The novel memory organization and management techniques address the data dependence-related issues between multiple processing units and enable to filter the flexible area on multicore decoder. In addition, we introduce the adaptive deblocking filtering order (ADFO) to minimize the impact of bus congestion when multiple cores interoperate for processing very large data. Furthermore, we design the deblocking filter with skip mode pipelining to achieve the high performance minimizing the increased cost and the power consumption. For SAO, we apply the window-based parallel SAO filtering scheme. The resource sharing is considered throughout the entire architecture. Based on both experimental and analytical results, our proposed design can achieve more than 1.31 Gpixels/s and less than 2.6 Gpixels/s at maximum frequency 660 MHz in single core, and consumes 56.2 Kgates including 10.6 Kgates for memory management architecture, which supports multicore decoder, and about 20.8 mW power on average when synthesizing with the 28 nm CMOS library. Moreover, the skip modes of DF improve both the performance and the power dissipation. The ADFO improves the performance of ~9.17% when decoding 8 K sequence on octacore at 400 MHz frequency. TpG (Throughput per Gate) is the highest among the related works.
KSP Keywords
28 nm CMOS, Data Dependencies, Data dependence, High performance, In-Loop, Management techniques, Maximum Frequency, Memory management, Multicore Utilization, Organization and Management, Parallel Processing