ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article PF-GEMV: Utilization maximizing architecture in fast matrix–vector multiplication for GPT-2 inference
Cited 1 time in scopus Download 93 time Share share facebook twitter linkedin kakaostory
Authors
Hyeji Kim, Yeongmin Lee, Chun-Gi Lyuh
Issue Date
2024-10
Citation
ETRI Journal, v.46, no.5, pp.817-828
ISSN
1225-6463
Publisher
한국전자통신연구원
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.4218/etrij.2024-0111
Abstract
Owing to the widespread advancement of transformer-based artificial neural networks, artificial intelligence (AI) processors are now required to perform matrix–vector multiplication in addition to the conventional matrix–matrix multiplication. However, current AI processor architectures are optimized for general matrix–matrix multiplications (GEMMs), which causes significant throughput degradation when processing general matrix–vector multiplications (GEMVs). In this study, we proposed a port-folding GEMV (PF-GEMV) scheme employing multiformat and low-precision techniques while reusing an outer product-based processor optimized for conventional GEMM operations. This approach achieves 93.7% utilization in GEMV operations with an 8-bit format on an 8 (Formula presented.) 8 processor, thus resulting in a 7.5 (Formula presented.) increase in throughput compared with that of the original scheme. Furthermore, when applied to the matrix operation of the GPT-2 large model, an increase in speed by 7 (Formula presented.) is achieved in single-batch inferences.
KSP Keywords
Artificial Neural Network, Formula presented, Low-precision, Matrix Operation, Outer product, Processor architecture, artificial intelligence, matrix multiplication, neural network(NN), transformer-based
This work is distributed under the term of Korea Open Government License (KOGL)
(Type 4: : Type 1 + Commercial Use Prohibition+Change Prohibition)
Type 4: