Research Area:  Machine Learning
Short text similarity plays an important role in natural language processing (NLP). It has been applied in many fields. Due to the lack of sufficient context in the short text, it is difficult to measure the similarity. The use of semantics similarity to calculate textual similarity has attracted the attention of academia and industry and achieved better results. In this survey, we have conducted a comprehensive and systematic analysis of semantic similarity. We first propose three categories of semantic similarity: corpus-based, knowledge-based, and deep learning (DL)-based. We analyze the pros and cons of representative and novel algorithms in each category. Our analysis also includes the applications of these similarity measurement methods in other areas of NLP. We then evaluate state-of-the-art DL methods on four common datasets, which proved that DL-based can better solve the challenges of the short text similarity, such as sparsity and complexity. Especially, bidirectional encoder representations from transformer model can fully employ scarce information of short texts and semantic information and obtain higher accuracy and F1 value. We finally put forward some future directions.
Author(s) Name:  Mengting Han,Xuan Zhang,Xin Yuan,Jiahao Jiang,Wei Yun,Chen Gao
Journal name:  Concurrency and Computation: Practice and Experience
Publisher name:  Wiley
Volume Information:  Volume33, Issue5 10 March 2021
Paper Link:   https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.5971