Research Area:  Machine Learning
The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.
Keywords:  
Sentiment Word Co-Occurrence
Knowledge Pair Feature Extraction
Latent Dirichlet Allocation (LDA)
Short Text Clustering Algorithm
Deep Learning
Machine Learning
Author(s) Name:  Di Wu, Ruixin Yang & Chao Shen
Journal name:  Journal of Intelligent Information Systems
Conferrence name:  
Publisher name:  Springer
DOI:  10.1007/s10844-020-00597-7
Volume Information:  volume 56, pages 1–23 (2021)
Paper Link:   https://link.springer.com/article/10.1007%2Fs10844-020-00597-7