Sentence Modeling via Multiple Word Embeddings and Multi-Level

Sentence Modeling via Multiple Word Embeddings and Multi-Level Comparison for Semantic Textual Similarity - 2019

Research Area: Machine Learning

Abstract:

Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.

Keywords:

Author(s) Name: Nguyen Huy Tien, Nguyen Minh Le, Yamasaki Tomohiro, Izuha Tatsuya

Journal name: Information Processing & Management

Conferrence name:

Publisher name: Elsevier

DOI: 10.1016/j.ipm.2019.102090

Volume Information: Volume 56, Issue 6, November 2019, 102090

Paper Link: https://www.sciencedirect.com/science/article/abs/pii/S0306457319301335

Office Address

Social List