ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Vectorized Implementation of Kyber and Dilithium on 32-bit Cortex-A Series
Cited 1 time in scopus Download 84 time Share share facebook twitter linkedin kakaostory
Authors
Youngbeom Kim, Seungyong Yoon, Seog Chung Seo
Issue Date
2024-08
Citation
IEEE Access, v.12, pp.104414-104428
ISSN
2169-3536
Publisher
Institute of Electrical and Electronics Engineers Inc.
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1109/ACCESS.2024.3435451
Abstract
In the field of Post-Quantum Cryptography (PQC), which typically demands more memory and relatively lower performance compared to Elliptic-Curve Cryptography (ECC), recent studies have been actively focused on neon-based parallel implementations for the 64-bit ARMv8-based Cortex-A series. However, research into implementing PQC on the widely adopted 32-bit ARMv7-based Cortex-A series remains insufficient. In this paper, we present the first instance of optimized implementation of Crystals-Kyber and Crystals-Dilithium, a Key Encapsulation Mechanism (KEM) and a Digital Signature Algorithm (DSA) selected by National Institute of Standards and Technology (NIST) for standardization, on a 32-bit ARMv7-based Cortex-A device. For computational efficiency, we finely tune widely used signed Montgomery multiplication and Barrett multiplication methods to take full advantage of the computational capabilities of NEON engine, a kind of Single-Instruction-Multiple-Data (SIMD) extension, available on the target device. Particularly, we propose improvements to internal parameters and operational techniques in Montgomery and Barrett arithmetic to preserve parallel processing logic. Moreover, we present an optimized merging technique tailored for the NEON engine of ARMv7, aimed at accelerating Number Theoretic Transform (NTT)-based polynomial multiplication. Compared to the state-of-the-art codes of PQM4, our approach achieves significant performance enhancements in Kyber and Dilithium: 62% (54%) for NTT, 50% (62%) for Point multiplication, and 56% (55%) for inverse NTT (NTT-1). Regarding the complete schemes, our implementations outperform the vectorized reference implementations, showing improvements of 50% (14%) in Key Generation, 43% (41%) in Encapsulation (Signing), and 52% (21%) in Decapsulation (Verifying) processes for Kyber768 (Dilithium3), respectively.
KSP Keywords
Computational Efficiency, Elliptic curve cryptography, Key Encapsulation mechanism, Key Generation, Montgomery multiplication, National Institute of Standards and Technology(NIST), Number Theoretic Transform, Parallel Processing, Parallel implementation, Polynomial multiplication, Post-Quantum Cryptography
This work is distributed under the term of Creative Commons License (CCL)
(CC BY NC ND)
CC BY NC ND