Research Area:  Machine Learning
Abstract Nowadays, social network sites (SNSs) have been significant platforms for content sharing in our daily life. With the emergence of different kinds of social network sites and users diverse needs for content sharing, their content sharing practices are generally taken place in multiple SNSs. To construct models that can characterize users content sharing practices in a composite context constituted by multiple social network sites (cross-site user generated content modeling) has been an emerging research topic in web data mining and human behavior research. However, previous methods such as Dirichlet Multinomial Mixture model (DMM), Biterm Topic Model (BTM), Twitter-LDA and MultiLDA have limited representation ability or are based on unreliable assumption, which cannot characterize the user generated content (UGC) accurately from the perspective of multiple SNSs. In this paper, we first conduct an empirical study to investigate the characteristics of users content sharing practices in cross-site context, based on which we propose a more reliable cross-site UGC model named CrossSite-LDA (C-LDA). We then evaluate the performances of the C-LDA model with four state-of-the-art models based on the two data sets sampled from Weibo–Douban and Facebook–Twitter. Results show that the C-LDA has better performances in perplexity, word coherence, topic KL divergence, UCI and UMass metrics compared with existing models, which suggests its superior accuracy on modeling users content characteristics in cross-site context.
Keywords:  
Author(s) Name:  Baoxi Liu, Peng Zhang, T. Lu, Ning Gu
Journal name:  Knowledge-Based Systems
Conferrence name:  
Publisher name:  ELSEVIER
DOI:  https://doi.org/10.1016/j.knosys.2020.106435
Volume Information:  Volume 209, 17 December 2020, 106435
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S0950705120305645?via%3Dihub