Research Area:  Machine Learning
We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on external sources of knowledge. We also developed a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms. We evaluated the proposed method on three popular datasets, namely Microsoft Research Paraphrase Corpus, STS2015 and P4PIN, and obtained state-of-the-art results on all three without using prior knowledge of natural language, e.g., part-of-speech tags or parse tree, which indicates the interdependent representations of short text pairs are effective and efficient for semantic textual similarity tasks.
Author(s) Name:  Hien T. Nguyen, Phuc H. Duong, Erik Cambria
Journal name:  Knowledge-Based Systems
Publisher name:  Elsevier
Volume Information:  Volume 182, 15 October 2019, 104842
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S095070511930317X