Research Area:  Machine Learning
Learning low-dimensional vector representations of words from a large corpus is one of the basic tasks in natural language processing (NLP). The existing universal word embedding model learns word vectors mainly through grammar and semantic information from the context, while ignoring the sentiment information contained in the words. Some approaches, although they model sentiment information in the reviews, do not consider certain words in different domains. In a case where the emotion changes, if the general word vector is directly applied to the review sentiment analysis task, then this will inevitably affect the performance of the sentiment classification. To solve this problem, this paper extends the CBoW (continuous bag-of-words) word vector model and proposes a cross-domain sentiment aware word embedding learning model, which can capture the sentiment information and domain relevance of a word at the same time. This paper conducts several experiments on Amazon user review data in different domains to evaluate the performance of the model. The experimental results show that the proposed model can obtain a nearly 2% accuracy improvement compared with the general word vector when modeling only the sentiment information of the context. At the same time, when the domain information and the sentiment information are both included, the accuracy and Macro-F1 value of the sentiment classification tasks are significantly improved compared with existing sentiment word embeddings.
Author(s) Name:  Jun Liu, Shuang Zheng, Guangxia Xu & Mingwei Lin
Journal name:  International Journal of Machine Learning and Cybernetics
Publisher name:  Springer
Volume Information:  volume 12, pages 343–354 (2021)
Paper Link:   https://link.springer.com/article/10.1007/s13042-020-01175-7