A word embedding is a learned representation of words for text and document analysis, where the adjacent words in the vector space are expected to have the same meaning with a similar representation. Words are mapped into vectors in language modeling and feature learning techniques to obtain the word embedding. It maps words and phrases to vectors of real numbers, and they capture both semantic and syntactic information of words. The main goal of word embedding is dimensionality reduction and predicting surrounding words using a word. It is used in many text analysis tasks, mainly natural language processing and information retrieval.
Word embeddings are commonly classified as Predictive based model: Prediction of target words based on the context in the pre-trained word embedding model, and Count or frequency-based model: Determine the target word by co-occurrence estimation of words in multiple contexts. Word2vec, doc2vec, TF-IDF (Term Frequency-Inverse Document Frequency), Skip-gram, GloVe (Global Vectors for Word Representation), CBOW (Continuous bowl of words), BERT (Bidirectional Encoder Representations from Transformers), and embedding layer are some of the word embedding representations. Other application scenarios of word embedding rather than text analysis are semantic analysis, syntax analysis, idiomaticity analysis, Part Of the Speech (POS) tagging, sentiment analysis, named entity recognition, textual entailment as well as machine translation.
• The recent research trend is to include the learning of word embedding into the neural language models to obtain the contextualized word embedding such as ELMO, GPT, BERT, and FASTTEXT.
• Deep Contextualized models improve the language understanding ability of networks via large-scale unsupervised pre-training.
• FastText, a contextualized model enriching word Vectors with subword information provides better word embeddings for rare words.
• Even though Word Embedding facilitates learning word embeddings for larger text data, labeling large lexical databases is a time-consuming and error-prone task.
• It is necessary to generate an effective model to handle large and refined data describing word representation over the unlabeled corpus.