Learning short-text semantic similarity with word embeddings

Learning short-text semantic similarity with word embeddings and external knowledge sources - 2019

Research Area: Machine Learning

Abstract:

We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on external sources of knowledge. We also developed a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms. We evaluated the proposed method on three popular datasets, namely Microsoft Research Paraphrase Corpus, STS2015 and P4PIN, and obtained state-of-the-art results on all three without using prior knowledge of natural language, e.g., part-of-speech tags or parse tree, which indicates the interdependent representations of short text pairs are effective and efficient for semantic textual similarity tasks.

Keywords:

Author(s) Name: Hien T. Nguyen, Phuc H. Duong, Erik Cambria

Journal name: Knowledge-Based Systems

Conferrence name:

Publisher name: Elsevier

DOI: 10.1016/j.knosys.2019.07.013

Volume Information: Volume 182, 15 October 2019, 104842

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S095070511930317X

Office Address

Social List