Research Area:  Machine Learning
It has been reported that embedded URLs and multimodal content (images, video, and sound recordings) in tweets are increasingly used to seduce users into a “wrong click,” leading to malware infection. In this paper, we predict whether a tweet is malicious or not by examining five classes of features: textual content including sentiment, paths emanating from a URL mentioned in the tweet, attributes associated with URLs, and multimodal content in the tweet. A fifth class of features first constructs a novel “tweet graph” and then defines features by analyzing “metapaths” contained in the tweet graph. Next, we propose a MALicious Tweets in Parallel (MALT P ) collective classification algorithm that merges together tweet graphs, metapaths, and collective classification proposed previously in the literature. We conduct detailed experiments using two data sets- Warningbird (WB) and KBA. We show that our metapath-based approach outperforms past efforts at identifying malicious tweets and further show that metapath-based features in conjunction with Alexa ranks and features from KBA yield very high predictive accuracy-over 0.98 on KBA and over 0.94 on KBA, outperforming past work. More significantly, metapath features alone generate a predictive accuracy of 0.977 and 0.923, respectively, on the KBA and WB data sets, significantly outperforming the other methods in isolation. We conduct a further analysis to identify the most important features; surprisingly, our results show that the presence of multimodal content is not a major factor and that metapath-based features dominate in separating malicious from benign tweets.
Author(s) Name:  Eric Lancaster; Tanmoy Chakraborty and V. S. Subrahmanian
Journal name:  IEEE Transactions on Computational Social Systems
Publisher name:  IEEE
Volume Information:  Volume: 5, Issue: 4, Dec. 2018,Page(s): 1096 - 1108
Paper Link:   https://ieeexplore.ieee.org/document/8472279