ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 An Efficient Architecture of In-Loop Filters for Multicore Scalable HEVC Hardware Decoders
Cited 8 time in scopus Download 11 time Share share facebook twitter linkedin kakaostory
저자
김현미, 고정길, 박성모
발행일
201804
출처
IEEE Transactions on Multimedia, v.20 no.4, pp.810-824
ISSN
1520-9210
출판사
IEEE
DOI
https://dx.doi.org/10.1109/TMM.2017.2759506
협약과제
17HB2500, 초절전 하이퍼바이저 기반 지능정보 매니코어프로세서 및 SW기술 개발, 권영수
초록
This paper proposes an efficient architecture of HEVC in-loop filters (ILFs) with the target of providing effective multicore utilization for ultra-high definition video applications. While HEVC allows for a high level of parallelization, the issue of data dependencies at the ILF leads to inefficient parallel processing performance. The novel memory organization and management techniques address the data dependence-related issues between multiple processing units and enable to filter the flexible area on multicore decoder. In addition, we introduce the adaptive deblocking filtering order (ADFO) to minimize the impact of bus congestion when multiple cores interoperate for processing very large data. Furthermore, we design the deblocking filter with skip mode pipelining to achieve the high performance minimizing the increased cost and the power consumption. For SAO, we apply the window-based parallel SAO filtering scheme. The resource sharing is considered throughout the entire architecture. Based on both experimental and analytical results, our proposed design can achieve more than 1.31 Gpixels/s and less than 2.6 Gpixels/s at maximum frequency 660 MHz in single core, and consumes 56.2 Kgates including 10.6 Kgates for memory management architecture, which supports multicore decoder, and about 20.8 mW power on average when synthesizing with the 28 nm CMOS library. Moreover, the skip modes of DF improve both the performance and the power dissipation. The ADFO improves the performance of ~9.17% when decoding 8 K sequence on octacore at 400 MHz frequency. TpG (Throughput per Gate) is the highest among the related works.
KSP 제안 키워드
28 nm CMOS, Data Dependencies, Data dependence, High performance, In-Loop, Management techniques, Maximum Frequency, Memory management, Multicore Utilization, Organization and Management, Parallel Processing