ETRI Knowledge Sharing Platform : Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Cited 2 time in scopus

Authors: Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun Song

Issue Date: 2024-11

Citation: IEEE Internet of Things Journal, v.11, no.22, pp.36384-36396

ISSN: 2327-4662

Publisher: Institute of Electrical and Electronics Engineers Inc.

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.1109/JIOT.2024.3403844

Abstract: Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training quantization (PTQ) methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters (≤5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT). We plan to release our code at https://gitlab.com/ones-ai/q-hyvit.

KSP Keywords: Computational requirements, Convolution neural network(CNN), Highly dynamic, Hybrid transformer, Hybrid vision, IoT systems, Mobile devices, Model parameter, linear complexity, neural network(NN), quantization method

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI