ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper DiT-Pruner: Pruning Diffusion Transformer Models for Text-to-Image Synthesis Using Human Preference Scores
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Youngwan Lee, Yong-Ju Lee, Sung Ju Hwang
Issue Date
2024-09
Citation
European Conference on Computer Vision (ECCV) 2024, pp.1-9
Language
English
Type
Conference Paper
Abstract
Despite their remarkable performance compared to U-Net based text-to-image (T2I) models, Diffusion Transformer (DiT)-based T2I models incur substantial inference costs due to their large model size and computational requirements. While recent efforts in layer pruning for large language models (LLMs) have found redundancy in Transformers, attempts to prune DiT models have not yet been explored. In this work, we propose a simple layer-pruning method specifically for DiT-based T2I models. Unlike pruning methods for LLMs that identify unimportant layers based on the similarity across layers or between input/output features of each layer, our approach prunes layers using a direct quality metric based on human preference scores, which more precisely reflects the overall generated image quality. In experiments using the Pixart-Σ model, our method outperforms similarity-based methods across different pruning ratios. Additionally, we find that fine-tuning with a knowledge distillation objective can further restore performance.
KSP Keywords
Computational requirements, Fine-tuning, Knowledge Distillation, Language Model, Pruning method, Quality Metrics, Restore Performance, Similarity-based methods, image quality, image synthesis