ETRI Knowledge Sharing Platform : DiT-Pruner: Pruning Diffusion Transformer Models for Text-to-Image Synthesis Using Human Preference Scores

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper DiT-Pruner: Pruning Diffusion Transformer Models for Text-to-Image Synthesis Using Human Preference Scores

Cited - time in scopus

Abstract: Despite their remarkable performance compared to U-Net based text-to-image (T2I) models, Diffusion Transformer (DiT)-based T2I models incur substantial inference costs due to their large model size and computational requirements. While recent efforts in layer pruning for large language models (LLMs) have found redundancy in Transformers, attempts to prune DiT models have not yet been explored. In this work, we propose a simple layer-pruning method specifically for DiT-based T2I models. Unlike pruning methods for LLMs that identify unimportant layers based on the similarity across layers or between input/output features of each layer, our approach prunes layers using a direct quality metric based on human preference scores, which more precisely reflects the overall generated image quality. In experiments using the Pixart-Σ model, our method outperforms similarity-based methods across different pruning ratios. Additionally, we find that fine-tuning with a knowledge distillation objective can further restore performance.

KSP Keywords: Computational requirements, Fine-tuning, Knowledge Distillation, Pruning method, Quality Metrics, Restore Performance, Similarity-based methods, image quality, image synthesis, language models

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.