Amazing technological breakthrough possible @S-Logix

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • +91- 81240 01111

Social List

Research Topic Ideas in Learning Word Embeddings

Research Topic Ideas in Learning Word Embeddings

   Word embedding is a representation of words in the form of numeric vectors learned using various language models. In deep learning, word embedding methods compute distributed representations of words, also known as word embeddings, in the form of continuous vectors. Some of the recent word embedding models are Global Vectors (GloVe), Embeddings from Language Model (ELMo), Generative Pre-trained Transformer (OpenAI-GPT), Contextual Word Vectors (CoVe), and Bidirectional Encoder Representations from Transformers (BERT). Embeddings from Language Model (ELMo) gained its language understanding from being trained to predict the next word in a sequence of words - a task called Language Modeling which is a bi-directional LSTM and it is convenient because of vast amounts of text data that such a model learns from without needing labels.
   Bidirectional Encoder Representations from Transformers (BERT) works on encoding mechanisms to generate language and utilizes bi-directional learning to gain context of words, meaning it understands the context of words by reading it both ways from left to right and right to left simultaneously. Global Vectors (GloVe) is a model for distributed word representation and is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity. It is an unsupervised learning algorithm for obtaining vector representations for words, and Training is performed on aggregated global word-word co-occurrence statistics from a corpus.
   Contextual Word Vectors (CoVe ) is a type of word embeddings learned by an encoder in an attentional seq-to-seq machine translation model. CoVe word representations are functions of the entire input sentence. Generative Pre-trained Transformer (GPT) is a model with absolute position embeddings and trained with a causal language modeling (CLM). It is powerful at predicting the next token in a sequence. Categories of OpenAI-GPT are GPT1, GPT2, and GPT3. Some of the practices for better word embedding learning outcomes are Soft sliding window, Sub-sampling frequent words, and Learning phrases first.