ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper VEX: Scaling HNSW-Based Vector Search with DPU Memory and Parallelism
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Kihwan Kim, Hyungsun Yoo, Woojung Kim, Donghyun Min, Myungcheol Lee, Jihoon Yang, Weikuan Yu, Youngjae Kim
Issue Date
2026-05
Citation
International Symposium on Cluster, Cloud and Internet Computing (CCGrid) 2026, pp.243-253
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/CCGrid68966.2026.00033
Abstract
Vector similarity search is a core component of modern AI services, and HNSW is widely adopted due to its high recall and low latency. However, its memory-intensive design makes billion-scale deployment difficult, and performance collapses when relying on swapping or remote memory. This paper targets recent DPUs (SmartNICs) with substantially improved compute capability and onboard DRAM, and proposes VEX, a host–DPU integrated vector search system that uses the DPU as both an extended memory tier and a parallel search engine for HNSW. VEX (i) partitions and places independent HNSW indices on the host and DPU while preserving semantic structure, (ii) minimizes host–DPU overhead via a dual-path DMA-based communication design, and (iii) overlaps search, communication, and aggregation with heterogeneity-aware pipelining. Experiments show that under memory pressure requiring disk access, VEX delivers 5–10× higher throughput than DiskANN at stable Recall@100. Even in ideal settings where the index fully resides in memory, VEX outperforms in-memory HNSW by up to 1.9× in query throughput.
Keyword
Vector Search, ANN, HNSW, DPU, Smart- NIC, DMA, Heterogeneous architecture, Memory disaggregation, Hardware-Software Co-Design
KSP Keywords
Communication Design, Extended memory, Hardware/software codesign, High recall, Low latency, Parallel search, Remote Memory, Search Engine, Similarity search, heterogeneous architectures, in-memory