Reliable cross-site user generated content modeling method

Research Area: Machine Learning

Abstract:

Abstract Nowadays, social network sites (SNSs) have been significant platforms for content sharing in our daily life. With the emergence of different kinds of social network sites and users diverse needs for content sharing, their content sharing practices are generally taken place in multiple SNSs. To construct models that can characterize users content sharing practices in a composite context constituted by multiple social network sites (cross-site user generated content modeling) has been an emerging research topic in web data mining and human behavior research. However, previous methods such as Dirichlet Multinomial Mixture model (DMM), Biterm Topic Model (BTM), Twitter-LDA and MultiLDA have limited representation ability or are based on unreliable assumption, which cannot characterize the user generated content (UGC) accurately from the perspective of multiple SNSs. In this paper, we first conduct an empirical study to investigate the characteristics of users content sharing practices in cross-site context, based on which we propose a more reliable cross-site UGC model named CrossSite-LDA (C-LDA). We then evaluate the performances of the C-LDA model with four state-of-the-art models based on the two data sets sampled from Weibo–Douban and Facebook–Twitter. Results show that the C-LDA has better performances in perplexity, word coherence, topic KL divergence, UCI and UMass metrics compared with existing models, which suggests its superior accuracy on modeling users content characteristics in cross-site context.

Keywords:

Author(s) Name: Baoxi Liu, Peng Zhang, T. Lu, Ning Gu

Journal name: Knowledge-Based Systems

Conferrence name:

Publisher name: ELSEVIER

DOI: https://doi.org/10.1016/j.knosys.2020.106435

Volume Information: Volume 209, 17 December 2020, 106435

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S0950705120305645?via%3Dihub

Office Address

Social List

A reliable cross-site user generated content modeling method based on topic model - 2020

Abstract:

S-Logix (OPC) Private Limited

Office Address

A reliable cross-site user generated content modeling method based on topic model - 2020

Abstract:

Related Papers