ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Out-of-core GPU 2D-shift-FFT Algorithm for Ultra-high-resolution Hologram Generation
Cited 3 time in scopus Download 34 time Share share facebook twitter linkedin kakaostory
저자
이재홍, 강호민, 염한주, 전상훈, 박중기, 김덕수
발행일
202106
출처
Optics Express, v.29 no.12, pp.19094-19112
ISSN
1094-4087
출판사
Optical Society of America (OSA)
DOI
https://dx.doi.org/10.1364/OE.422266
협약과제
21HH5700, [전문연구실] 홀로그램 영상 서비스를 위한 Holo-TV 핵심 기술 개발, 박중기
초록
We propose a novel out-of-core GPU algorithm for 2D-Shift-FFT (i.e., 2D-FFT with FFT-shift) to generate ultra-high-resolution holograms. Generating an ultra-high-resolution hologram requires a large complex matrix (e.g., 100K2) with a size that typically exceeds GPU memory. To handle such a large-scale hologram plane with limited GPU memory, we employ a 1D-FFT based 2D-FFT computation method. We transpose the column data to have a continuous memory layout to improve the column-wise 1D-FFT stage performance in both the data communication and GPU computation. We also combine the FFT-shift and transposition steps to reduce and hide the workload. To maximize the GPU utilization efficiency, we exploit the concurrent execution ability of recent heterogeneous computing systems. We also further optimize our method's performance with our cache-friendly chunk generation algorithm and pinned-memory buffer approach. We tested our method on three computing systems having different GPUs and various sizes of complex matrices. Compared to the conventional implementation based on the state-of-the-art GPU FFT library (i.e., cuFFT), our method achieved up to 3.24 and 3.06 times higher performance for a large-scale complex matrix in single- and double-precision cases, respectively. To assess the benefits offered by the proposed approach in an actual application, we applied our method to the layer-based CGH process. As a result, it reduced the time required to generate an ultra-high-resolution hologram (e.g., 100K2) up to 28% compared to the use of the conventional algorithm. These results demonstrate the efficiency and usefulness of our method.
KSP 제안 키워드
2D-FFT, Complex matrices, Computation method, Concurrent Execution, Double-precision, FFT algorithm, GPU algorithm, GPU computation, GPU utilization, Generation algorithm, Heterogeneous computing systems