Weibo Sentiment Analysis Based on Word2vec and CNN
- Author(s)
- 사옥결
- Issued Date
- 2018
- Abstract
- With the rapid development of Internet, the enthusiasm of netizens is increasing. Micro-blog has become an important platform for netizens to express their emotions[8]. The social network of micro-blog reflects the emotional tendency of netizens to a great extent. How to quickly find hidden emotional information from micro-blog has drawed most of researchers attentions. As social networks are developing, sentiment analysis on social media such as Facebook, Twitter and Weibo becomes a new trend in recent years. Most of different methods have been proposed for sentiment analysis, including traditional methods (SVM and NB) and deep learning methods (RNN and CNN)[3]. In addition, the latter always outperform the former. However, many existing methods only focus on local text information and ignore the user personality and content characteristics.
The traditional sentiment analysis takes a lot of time to extract the characteristics of the data, and it needs to combine the relevant rules to get better results. In the big data era, the amount of data is increasing which also increases the difficulty of feature extraction . In this paper, I will use deep learning method called CNN to determine the emotional information in micro-blog and extract the feature of the word vector[5]. A framework called Word2vec + Convolutional Neural Network (CNN) is proposed to complete Weibo’s sentiment analysis. Firstly, I use the word2vec proposed by the website (https://spaces.ac.cn/archives/4304) to compute vector representations of words, which will be the input for the CNN. The purpose of using word2vec is to gain the vector representation of words and reflect the distance of words[2]. That will lead to initialize the parameters at a good point of CNN, which can efficiently improve the performance of the nets in this problem. Secondly, I design a suitable CNN architecture for the sentiment analysis task. I use 2 convolutional layers and 3 full-connected layers in this architecture. By the experiment verification, this is a valuable model applied using word2vec and CNN to analyze sentences’ sentiment which gets a perfect results. I also use the Rectified Linear Unit (ReLU), Normalization and Dropout technology to improve the accuracy and generalizability of our model[7]. I test my framework in a public dataset which is the corpus of Weibo’s comments that includes 2 labels: negative and positive. My network achieves test accuracy of 90.20% in this dataset, which is a better performance than some other traditional neural network models and other multiple layers CNN based on my CNN structure[1].
- Alternative Title
- Word2vec과 CNN기반의 웨이보 감정분석
- Alternative Author(s)
- Xie Yujie
- Department
- 일반대학원 컴퓨터공학과
- Advisor
- Kim Pankoo
- Awarded Date
- 2018-08
- Table Of Contents
- Ⅰ. Introduction 1
A. Motivation 1
B. Outline 2
Ⅱ. Background Concepts 3
A. Weibo Introduction 4
B. CNN 5
a. Convolutional Layer 6
b. Pooling Layer 9
c. Full-connected Layer 10
C. Word2vec 11
a. CBOW 13
b. Skip-gram 14
D. Glove 14
E. Jieba (Chinese segmentation) 15
F. Keras 16
Ⅲ. Proposed method 17
A. Data pre-processing 17
a. Pre-trained Word2vec 17
b. Word Segmentation(Jieba) 17
c. Training Data and Test Data 18
B.ARCHITECTURE OF CNN MODEL 19
a. CNN Model Structure 19
b. Different Batch-size Trade-off 20
c. Comparison of multiple CNN Layers 25
a. CNN 2 26
b. CNN 3 27
IⅤ. Experimental Evolution 29
Ⅴ. Conclusions 35
References 36
- Degree
- Master
- Publisher
- Chosun University
- Citation
- 사옥결. (2018). Weibo Sentiment Analysis Based on Word2vec and CNN.
- Type
- Dissertation
- URI
- https://oak.chosun.ac.kr/handle/2020.oak/13596
http://chosun.dcollection.net/common/orgView/200000266864
-
Appears in Collections:
- General Graduate School > 3. Theses(Master)
- Authorize & License
-
- AuthorizeOpen
- Embargo2018-08-24
- Files in This Item:
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.