Research Area:  Data Mining
Sentiment analysis helps evaluating the performance of products or services from user generated contents. Lexicon based sentiment analysis approaches are preferred over learning based ones when training data is not adequate. Existing lexicons contain only unigrams along with their sentiment scores. It is observed that sentiment n-grams formed by combining unigrams with intensifiers or negations show improved results. Such sentiment n-gram lexicons are not publicly available. This paper presents a methodology to create such a lexicon called Senti-N-Gram. Proposed rule-based approach extracts the n-grams sentiment scores from a random corpus containing product reviews and corresponding numeric rating in five-point scale. The scores from this automated procedure are compared with that of the human annotators using t-test and found to be statistically equivalent. The paper also proposes a sentiment classification methodology by using a ratio based approach based on counts of positive and negative sentences of a document. When used Senti-N-Gram lexicon, proposed method outperforms well-known unigram-lexicon based approach using VADER and an n-gram sentiment analysis approach SO-CAL.
Keywords:  
Author(s) Name:  AtanuDey,Mamata Jenamani and Jitesh J.Thakkar
Journal name:  Expert Systems with Applications
Conferrence name:  
Publisher name:  ELSEVIER
DOI:  10.1016/j.eswa.2018.03.004
Volume Information:  Volume 103, 1 August 2018, Pages 92-105
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S095741741830143X