Jointly Learning Word Embeddings using a Corpus

Jointly Learning Word Embeddings using a Corpus and a Knowledge Base - 2018

Research Paper on Jointly Learning Word Embeddings Using A Corpus And A Knowledge Base

Research Area: Machine Learning

Abstract:

Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks.

Keywords:
Jointly Learning
Word Embeddings
Corpus
Knowledge Base
Machine Learning
Deep Learning

Author(s) Name: Mohammed Alsuhaibani ,Danushka Bollegala,Takanori Maehara,Ken-ichi Kawarabayashi

Journal name: PLOS ONE

Conferrence name:

Publisher name: PLOS

DOI: 10.1371/journal.pone.0193094

Volume Information:

Paper Link: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193094

Office Address

Social List