ETRI Knowledge Sharing Platform : Generating Cartoon Scene Images with Latent Diffusion Models

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Generating Cartoon Scene Images with Latent Diffusion Models

Cited - time in scopus

Citation: International Conference on Multimedia Information Technology and Applications (MITA) 2023, pp.11-14

Abstract: Recent text-to-image models have shown the superior performance in generating realistic images. In this paper, we propose a method of generating cartoon scene images by fine-tuning the text-to-image diffusion model. To enable the model to generate a cartoon scene in which a specific style of cartoon character appears, we fine-tuned the model using Dreambooth in order for the model to learn the cartoon characters. At inference stage, we focused on the image-to-image method of translating a given reference image into a new image under the instruction of text prompt. Moreover, we additionally adopted ControlNet and latent couple at the inference stage, where ControlNet enables the model to generate the cartoon character doing the same pose with the human in the reference image, while the latent couple technique allows users to designate the positions of each character in the image. From the results, it is demonstrated that users can generate not only cartoon scene images of a character in various contexts, including different facial expressions and poses, but also cartoon scene images of multiple characters.

KSP Keywords: Cartoon Character, Cartoon Scene, Diffusion Model, Facial expression, Fine-tuning, Image diffusion, Image method, Reference Image, Scene images, image models, superior performance

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.