CHOSUN

Chosun Univ. Login

검색

CHOSUN Repository University & Graduate Shool General Graduate School 4. Theses(Ph.D)

멀티모달 AI를 적용한 웹툰 생성 연구

Metadata Downloads

Author(s): 유경호

Issued Date: 2023

Keyword: 멀티모달, 대조 학습, 확산 모델, 이미지 생성

Abstract: In this thesis, I conducted research on generating webtoons using deep learning-based text-to-image generation technology to assist webtoon creators in their creative activities. The research methodology involved constructing a multimodal webtoon dataset by using publicly available datasets such as MSCOCO. The generated dataset consists of treatments (that is text descriptions) and their corresponding webtoon – treatment-webtoon dataset. Furthermore, continuous text data was also collected using ChatGPT. To generate webtoon, this thesis proposed to utilize a multilingual BERT model for feature extraction from the treatments, add noise to them, and input the noisy features into a DCGAN (Deep Convolutional Generative Adversarial Network). The experimental results showed relatively low performance with an inception score of 4.9 and FID (Fréchet Inception Distance) of 22.21.

To overcome the limitations of DCGAN, this thesis proposed to train the CLIP (Contrastive Language-Image Pretraining) model on the treatment-webtoon dataset by measuring the similarity between text and images and then use the diffusion model to generate webtoon. CLIP is a model that can learn a relationship between multimodal data (such as the treatment-webtoon dataset) by extracting features from each data modality. The goal is to bring similar data closer and dissimilar data farther apart in the same feature space, which is achieved by contrastive learning. In this thesis, the performance of CLIP trained on the treatment-webtoon dataset was evaluated using quantitative metrics such as measuring similarity between bilingual treatments and images, treatment-based search queries for similar images, and zero-shot classification.

For generating webtoons based on the diffusion model, a desired text along with its CLIP features and the image with the most similar CLIP features were inputted into a pretrained depth-to-image model. In the experiments, webtoons were generated using both single text and continuous text inputs. The results showed that when using continuous text inputs for webtoon generation, the inception score improved from 4.9, as in the case of DCGAN, to 7.14 and the generated images were of higher quality.

The technology developed in this thesis can be used by webtoon creators by inputting their desired text to generate webtoons more efficiently and in a timely manner. However, one of the main limitations of this work is that currently it cannot generate webtoons from multiple sentences and images or maintain consistent artistic style throughout the generated images. Therefore, further research is needed on diffusion models that can handle multiple sentences as inputs and generate images with consistent artistic styles when continuous text inputs are provided.

Alternative Title: A Study on Webtoon Generation Using Multimodal AI

Alternative Author(s): Kyungho Yu

Affiliation: 조선대학교 일반대학원

Department: 일반대학원 컴퓨터공학과

Advisor: 김판구

Awarded Date: 2023-08

Table Of Contents: Ⅰ. 서 론 1
A. 연구의 배경 및 목적 1
B. 연구 내용 3

Ⅱ. 관련연구 5
A. 딥러닝 기반 텍스트에서 이미지 생성 5
1. 적대적 생성 신경망 5
2. 확산 모델 8
B. 멀티모달 학습 11

Ⅲ. 적대적 생성 신경망 기반의 멀티모달 데이터를 이용한 웹툰 생성 14
A. 적대적 생성 신경망을 이용한 텍스트에서 웹툰 생성 방법 14
B. 적대적 생성 신경망을 이용한 텍스트에서 웹툰 생성 결과 18

Ⅳ. 트리트먼트-웹툰 데이터 셋의 멀티모달 학습 방법 26
A. CLIP 모델을 이용한 멀티모달 학습 방법 26
B. CLIP 모델의 실험 결과 29
1. 텍스트-웹툰 사이의 유사도 측정 32
2. 텍스트와 유사한 웹툰 검색 37
3. 제로샷 분류 42

Ⅴ. 확산 모델 기반의 멀티모달 데이터를 이용한 웹툰 생성 46
A. 확산 모델을 이용한 텍스트에서 웹툰 생성 방법 46
B. 확산 모델을 이용한 텍스트에서 웹툰 생성 결과 48
1. 하나의 텍스트를 입력으로 웹툰 생성 48
2. 연속된 텍스트를 입력으로 웹툰 생성 54

Ⅵ. 결 론 58

참고문헌 61

Degree: Doctor

Publisher: 조선대학교 대학원

Citation: 유경호. (2023). 멀티모달 AI를 적용한 웹툰 생성 연구.

Type: Dissertation

URI: https://oak.chosun.ac.kr/handle/2020.oak/17791
http://chosun.dcollection.net/common/orgView/200000693567

Appears in Collections:: General Graduate School > 4. Theses(Ph.D)

Show Simple Item RecordShow Full Item Record

Authorize & License

AuthorizeOpen
Embargo2023-08-25

qrcode

트윗하기

개인정보처리방침