CHOSUN

Weibo Sentiment Analysis Based on Word2vec and CNN

Metadata Downloads
Author(s)
사옥결
Issued Date
2018
Abstract
With the rapid development of Internet, the enthusiasm of netizens is increasing. Micro-blog has become an important platform for netizens to express their emotions[8]. The social network of micro-blog reflects the emotional tendency of netizens to a great extent. How to quickly find hidden emotional information from micro-blog has drawed most of researchers attentions. As social networks are developing, sentiment analysis on social media such as Facebook, Twitter and Weibo becomes a new trend in recent years. Most of different methods have been proposed for sentiment analysis, including traditional methods (SVM and NB) and deep learning methods (RNN and CNN)[3]. In addition, the latter always outperform the former. However, many existing methods only focus on local text information and ignore the user personality and content characteristics.
The traditional sentiment analysis takes a lot of time to extract the characteristics of the data, and it needs to combine the relevant rules to get better results. In the big data era, the amount of data is increasing which also increases the difficulty of feature extraction . In this paper, I will use deep learning method called CNN to determine the emotional information in micro-blog and extract the feature of the word vector[5]. A framework called Word2vec + Convolutional Neural Network (CNN) is proposed to complete Weibo’s sentiment analysis. Firstly, I use the word2vec proposed by the website (https://spaces.ac.cn/archives/4304) to compute vector representations of words, which will be the input for the CNN. The purpose of using word2vec is to gain the vector representation of words and reflect the distance of words[2]. That will lead to initialize the parameters at a good point of CNN, which can efficiently improve the performance of the nets in this problem. Secondly, I design a suitable CNN architecture for the sentiment analysis task. I use 2 convolutional layers and 3 full-connected layers in this architecture. By the experiment verification, this is a valuable model applied using word2vec and CNN to analyze sentences’ sentiment which gets a perfect results. I also use the Rectified Linear Unit (ReLU), Normalization and Dropout technology to improve the accuracy and generalizability of our model[7]. I test my framework in a public dataset which is the corpus of Weibo’s comments that includes 2 labels: negative and positive. My network achieves test accuracy of 90.20% in this dataset, which is a better performance than some other traditional neural network models and other multiple layers CNN based on my CNN structure[1].
Alternative Title
Word2vec과 CNN기반의 웨이보 감정분석
Alternative Author(s)
Xie Yujie
Department
일반대학원 컴퓨터공학과
Advisor
Kim Pankoo
Awarded Date
2018-08
Table Of Contents
Ⅰ. Introduction 1
A. Motivation 1
B. Outline 2

Ⅱ. Background Concepts 3
A. Weibo Introduction 4
B. CNN 5
a. Convolutional Layer 6
b. Pooling Layer 9
c. Full-connected Layer 10
C. Word2vec 11
a. CBOW 13
b. Skip-gram 14
D. Glove 14
E. Jieba (Chinese segmentation) 15
F. Keras 16
Ⅲ. Proposed method 17
A. Data pre-processing 17
a. Pre-trained Word2vec 17
b. Word Segmentation(Jieba) 17
c. Training Data and Test Data 18
B.ARCHITECTURE OF CNN MODEL 19
a. CNN Model Structure 19
b. Different Batch-size Trade-off 20
c. Comparison of multiple CNN Layers 25
a. CNN 2 26
b. CNN 3 27

IⅤ. Experimental Evolution 29

Ⅴ. Conclusions 35

References 36
Degree
Master
Publisher
Chosun University
Citation
사옥결. (2018). Weibo Sentiment Analysis Based on Word2vec and CNN.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/13596
http://chosun.dcollection.net/common/orgView/200000266864
Appears in Collections:
General Graduate School > 3. Theses(Master)
Authorize & License
  • AuthorizeOpen
  • Embargo2018-08-24
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.