Research Area:  Machine Learning
Semantic Textual Similarity and Natural Language Inference are two popular natural language understanding tasks used to benchmark sentence representation models where two sentences are paired. In such tasks sentences are represented as bag of words, sequences, trees or convolutions, but the attention model is based on word pairs. In this article we introduce the use of word n-grams in the attention model. Our results on five datasets show an error reduction of up to 41% with respect to the word-based attention model. The improvements are especially relevant with low data regimes and, in the case of natural language inference, on the recently released hard subset of Natural Language Inference datasets.
Keywords:  
Author(s) Name:  I.Lopez-Gazpio, M. Maritxalar, E. Agirre
Journal name:  Expert Systems with Applications
Conferrence name:  
Publisher name:  Elsevier
DOI:  10.1016/j.eswa.2019.04.054
Volume Information:  Volume 132, 15 October 2019, Pages 1-11
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S0957417419302842