Subword-level Composition Functions for Learning

Subword-level Composition Functions for Learning Word Embeddings - 2018

Research Paper on Subword-Level Composition Functions For Learning Word Embeddings

Research Area: Machine Learning

Abstract:

Subword-level information is crucial for capturing the meaning and morphology of words, especially for out-of-vocabulary entries. We propose CNN- and RNN-based subword-level composition functions for learning word embeddings, and systematically compare them with popular word-level and subword-level models (Skip-Gram and FastText). Additionally, we propose a hybrid training scheme in which a pure subword-level model is trained jointly with a conventional word-level embedding model based on lookup-tables. This increases the fitness of all types of subword-level word embeddings; the word-level embeddings can be discarded after training, leaving only compact subword-level representation with much smaller data volume. We evaluate these embeddings on a set of intrinsic and extrinsic tasks, showing that subword-level models have advantage on tasks related to morphology and datasets with high OOV rate, and can be combined with other types of embeddings.

Keywords:
Subword-Level Composition Functions
Learning Word Embeddings
Machine Learning
Deep Learning

Author(s) Name: Bofang Li, Aleksandr Drozd, Tao Liu, Xiaoyong Du

Journal name: Proceedings of the Second Workshop on Subword/Character LEvel Models

Conferrence name:

Publisher name: Association for Computational Linguistics

DOI: 10.18653/v1/W18-1205

Volume Information: pages: 38–48

Paper Link: https://aclanthology.org/W18-1205/?fbclid=IwAR0c5YvEFE3zgbrw4KJF78wflA1M6l0rVv7xQoZ1bg7POXDlKx3af4X5q-s

Office Address

Social List