Research Area:  Machine Learning
Traditional text emotion analysis methods are primarily devoted to studying extended texts, such as news reports and full-length documents. Microblogs are considered short texts that are often characterized by large noises, new words, and abbreviations. Previous emotion classification methods usually fail to extract significant features and achieve poor classification effect when applied to processing of short texts or micro-texts. This study proposes a microblog emotion classification model, namely, CNN_Text_Word2vec, on the basis of convolutional neural network (CNN) to solve the above-mentioned problems. CNN_Text_Word2vec introduces a word2vec neural network model to train distributed word embeddings on every single word. The trained word vectors are used as input features for the model to learn microblog text features through parallel convolution layers with multiple convolution kernels of different sizes. Experiment results show that the overall accuracy rate of CNN_Text_Word2vec is 7.0% higher than that achieved by current mainstream methods, such as SVM, LSTM and RNN. Moreover, this study explores the impact of different semantic units on the accuracy of CNN_Text_Word2vec, specifically in processing of Chinese texts. The experimental results show that comparing to using feature vectors obtained from training words, feature vector obtained from training Chinese characters yields a better performance.
Author(s) Name:  Dongliang Xu, Zhihong Tian, Rufeng Lai, Xiangtao Kong, Zhiyuan Tand Wei Shi
Journal name:  Information Fusion
Publisher name:  ELSEVIER
Volume Information:  Volume 64, December 2020, Pages 1-11
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S156625352030302X