Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Subword-level Composition Functions for Learning Word Embeddings - 2018

Research Area:  Machine Learning

Abstract:

Subword-level information is crucial for capturing the meaning and morphology of words, especially for out-of-vocabulary entries. We propose CNN- and RNN-based subword-level composition functions for learning word embeddings, and systematically compare them with popular word-level and subword-level models (Skip-Gram and FastText). Additionally, we propose a hybrid training scheme in which a pure subword-level model is trained jointly with a conventional word-level embedding model based on lookup-tables. This increases the fitness of all types of subword-level word embeddings; the word-level embeddings can be discarded after training, leaving only compact subword-level representation with much smaller data volume. We evaluate these embeddings on a set of intrinsic and extrinsic tasks, showing that subword-level models have advantage on tasks related to morphology and datasets with high OOV rate, and can be combined with other types of embeddings.

Author(s) Name:  Bofang Li, Aleksandr Drozd, Tao Liu, Xiaoyong Du

Journal name:  Proceedings of the Second Workshop on Subword/Character LEvel Models

Conferrence name:  

Publisher name:  Association for Computational Linguistics

DOI:  10.18653/v1/W18-1205

Volume Information:  pages: 38–48