ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper A Bounded-Search of Size-Local Blind Spots in an Official cuTile GEMM Sample on Blackwell
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Jiwon Lee, JooHyoung Cha, Yongjoo Kim, Yongin Kwon
Issue Date
2026-05
Citation
Annual Symposium of KIPS (ASK) 2026, pp.96-98
Publisher
한국정보처리학회
Language
English
Type
Conference Paper
Abstract
We present a measurement study of FP16 square GEMM on an NVIDIA RTX PRO 6000 Blackwell GPU. Rather than identifying the universally fastest framework, we ask whether an official cuTile sample configuration can misrepresent framework capability at irregular sizes when stronger configurations exist in the same environment. We evaluate a bounded candidate pool of ten cuTile-Python configurations across five irregular sizes (1664, 1792, 1920, 2432, 2560). Using the official 128x256x64 regular configuration as the reference, measured regret ranges from 74.73% to 84.65% (mean 79.33%). At sizes 1664 and 2560, this gap is sufficient to reverse the relative cuTile-Triton ranking depending solely on which cuTile path is reported.