CHOSUN

음성 및 텍스트 데이터로부터 Bi-LSTM과 CNN의 4-stream 기반 우울증 진단

Metadata Downloads
Author(s)
조아현
Issued Date
2022
Abstract
Depression is a disease that causes changes in emotions, thoughts, and behavior, and it falls into severe depression by leaving the disease unattended, which can lead to various problems. Currently, the diagnosis of depression is based on inconsistent subjective opinions of clinicians. In addition, it is done in a way that patients directly tell their conditions through questionnaires. However, these methods have disadvantages in that they are limited and difficult to diagnose accurately because objective opinions are excluded. Therefore, we need a system that objectively and accurately diagnoses depression by considering various data and features. Recently, interest in automated system design has been increasing in the field of the affective computing community and artificial intelligence to efficiently diagnose depression. In particular, based on deep learning technology, research on depression diagnosis is being actively conducted using multi-mode, which can utilize much information by fusion of multiple data rather than a single-mode using one data.
In this paper, we propose a depression diagnosis model based on a 4-stream of Bidirectional Long Short Term Memory(Bi-LSTM) and Convolutional Neural Network(CNN) from audio and text data. One-dimensional features of speech signals were extracted using Mel Frequency Cepstral Coefficient(MFCC) and Gammatone Cepstral Coefficients(GTCC). Also, two-dimensional features were extracted from the Bark spectrogram, ERB spectrogram, and Log-Mel spectrogram based on time-frequency transformation. These features were applied to Bi-LSTM, and CNN-based transport learning models such as VGGish, YAMNet, and OpenL3. For text data, word-encoding was used to map the text into sequences with numeric indices. And word embedding concepts were used to represent all words as dense numeric vectors. These texts were applied to Bi-LSTM, and the CNN model based on n-gram. Finally, the softmax values of the four deep learning models were ensembled using the late score fusion method to diagnose depression based on the 4-stream. The data used in the experiment is the Extended Distress Analysis Interview Corpus Wizard of Oz (EDAIC-WOZ) depression database designed to help diagnose people's psychological distress state. The noise was removed from speech data, and unnecessary words were cleaned up from text data through preprocessing to improve data quality. Also, extending the depression data solved the class imbalance problem. The experimental results showed that the performance was improved from min 1.22% to max 2.44% more when using the 4-stream model than the single model. In addition, the proposed model was more competitive than the 2-stream-based model of the previous study under the same data conditions. Likewise, the proposed model showed good performance when evaluating the performance using the EDAIC-WOZ database. These results proved that the proposed model is effective in diagnosing depression.
Alternative Title
Depression Diagnosis Based on 4-stream of Bi-LSTM and CNN from audio and text data
Alternative Author(s)
Jo, A-Hyeon
Affiliation
조선대학교 일반대학원
Department
일반대학원 전자공학과
Advisor
곽근창
Awarded Date
2022-08
Table Of Contents
제1장 서론 1
제1절 연구 배경 및 목적 1
제2절 연구 내용 및 구성 3

제2장 관련 연구 5
제1절 음성 데이터를 이용한 딥러닝 기반 우울증 진단 5
제2절 텍스트 데이터를 이용한 딥러닝 기반 우울증 진단 9
제3절 멀티모달 데이터를 이용한 앙상블 기반 우울증 진단 11

제3장 음성 및 텍스트 데이터로부터 Bi-LSTM과 CNN의 4-stream 기반 우울증 진단 15
제1절 음성신호를 이용한 딥러닝 모델 설계 16
1. 1차원 음성신호 특징추출 방법 16
2. 2차원 시간-주파수 변환 기반 특징추출 방법 20
3. 1차원 음성신호 기반 Bi-LSTM 모델 29
4. 2차원 시간-주파수 변환 기반 CNN 전이학습 모델 36
제2절 텍스트 데이터를 이용한 딥러닝 모델 설계 39
1. 워드 임베딩(word embedding) 39
2. 텍스트 데이터를 이용한 딥러닝 모델 40
제3절 음성 및 텍스트 데이터를 이용한 4-stream 기반 딥러닝 모델 설계 및 우울증 진단 43
1. 멀티모달 및 Late score fusion 방법 43
2. 음성 및 텍스트 데이터로부터 우울증 진단을 위한 4-stream 기반 딥러닝 모델 44

제4장 실험 및 결과분석 46
제1절 Extended DAIC-WOZ 우울증 데이터베이스 46
제2절 데이터 전처리 방법 52
1. 음성 데이터 전처리 및 확장 52
2. 텍스트 데이터 전처리 및 확장 54
제3절 실험 및 결과분석 57
1. 음성 데이터를 이용한 우울증 진단 실험 및 결과 58
2. 텍스트 데이터를 이용한 우울증 진단 실험 및 결과 65
3. 음성 및 텍스트 데이터를 이용한 4-stream 모델 기반 우울증 진단 실험 및 결과 68

제5장 결론 77

참고문헌 79
Degree
Master
Publisher
조선대학교 대학원
Citation
조아현. (2022). 음성 및 텍스트 데이터로부터 Bi-LSTM과 CNN의 4-stream 기반 우울증 진단.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/17491
http://chosun.dcollection.net/common/orgView/200000623839
Appears in Collections:
General Graduate School > 3. Theses(Master)
Authorize & License
  • AuthorizeOpen
  • Embargo2022-08-26
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.