ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article An Operand Divergence-Aware All-Bank PIM Architecture for Sparse Attention
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Soojin Hwang, Sanghyeon Lee, Juhyun Lee, Jaehyuk Huh
Issue Date
2026-04
Citation
IEEE Computer Architecture Letters, v.25, pp.158-161
ISSN
1556-6056
Publisher
IEEE
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/LCA.2026.3684216
Abstract
Processing-in-memory (PIM) architectures leverage all-bank execution to provide high internal bandwidth for memory-bound operations. However, synchronized control across banks fundamentally conflicts with the irregular access patterns of sparse workloads, where operand divergence requires banks to access different row or column addresses. We propose a novel divergence-aware all-bank PIM architecture that resolves operand-level data divergence. Our design enables each bank to dynamically access distinct rows or columns under a shared command, allowing sparse execution without sacrificing bank-level parallelism. We apply this architecture to accelerate large language models with dynamic KV-cache filtering with unstructured token-level sparsity. The simulation results show that the proposed design achieves an average 6.6× speedup over dense all-bank PIM.
Keyword
Processing-in-memory (PIM), data divergence