Research Area:  Machine Learning
As an emerging research paradigm, big data analytics has been gaining currency in various fields. However, in existing hospitality and tourism literature there is scarcity of discussions on the quality of data which may impact the validity and generalizability of research findings. This study examines the reliability of online hotel reviews in TripAdvisor by developing a text classifier to predict travel purpose (i.e., business vs. leisure) based upon review textual contents. The classifier is tested over a range of cities and data sizes to examine its sensitivity to data samples. The findings show that, while the classifiers performance is consistent across different cities, there are variations in response to data sizes and sampling methods. More importantly, a considerable amount of noise is found in the data, which leads to misclassification. Furthermore, a novel approach is developed to address the misclassification problem resulting from data noise. This study reveals important data quality issues and contributes to the theoretical development of social media analytics in hospitality and tourism.
Keywords:  
Author(s) Name:  Zheng Xiang, Qianzhou Du, Yufeng Ma & Weiguo Fan
Journal name:  Information Technology & Tourism
Conferrence name:  
Publisher name:  Springer
DOI:  https://doi.org/10.1007/s40558-017-0098-z
Volume Information:  volume 18, pages 43–59
Paper Link:   https://link.springer.com/article/10.1007/s40558-017-0098-z