ETRI Knowledge Sharing Platform : An Operand Divergence-Aware All-Bank PIM Architecture for Sparse Attention

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article An Operand Divergence-Aware All-Bank PIM Architecture for Sparse Attention

Cited 0 time in scopus

Abstract: Processing-in-memory (PIM) architectures leverage all-bank execution to provide high internal bandwidth for memory-bound operations. However, synchronized control across banks fundamentally conflicts with the irregular access patterns of sparse workloads, where operand divergence requires banks to access different row or column addresses. We propose a novel divergence-aware all-bank PIM architecture that resolves operand-level data divergence. Our design enables each bank to dynamically access distinct rows or columns under a shared command, allowing sparse execution without sacrificing bank-level parallelism. We apply this architecture to accelerate large language models with dynamic KV-cache filtering with unstructured token-level sparsity. The simulation results show that the proposed design achieves an average 6.6× speedup over dense all-bank PIM.

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.