Research Area:  Machine Learning
In literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.
Keywords:  
Preprocessing techniques
Data augmentation
Sentiment analysis
Support vector machine
Feature extraction
Author(s) Name:  Huu-Thanh Duong and Tram-Anh Nguyen-Thi
Journal name:  Computational Social Networks
Conferrence name:  
Publisher name:  Springer
DOI:  https://doi.org/10.1186/s40649-020-00080-x
Volume Information:   volume 8, Article number: 1
Paper Link:   https://computationalsocialnetworks.springeropen.com/articles/10.1186/s40649-020-00080-x