Research Area:  Machine Learning
Language modeling is the task of assigning a probability distribution over sequences of words that matches the distribution of a language. A language model is required to represent the text to a form understandable from the machine point of view. A language model is capable to predict the probability of a word occurring in the context-related text. Although it sounds formidable, in the existing research, most of the language models are based on unidirectional training. In this paper, we have investigated a bi-directional training model-BERT (Bidirectional Encoder Representations from Transformers). BERT builds on top of the bidirectional idea as compared to other word embedding models (like Elmo). It practices the comparatively new transformer encoder-based architecture to compute word embedding. In this paper, it has been described that how this model is to be producing or achieving state-of-the-art results on various NLP tasks. BERT has the capability to train the model in bi-directional over a large corpus. All the existing methods are based on unidirectional training (either the left or the right). This bi-directionality of the language model helps to obtain better results in the context-related classification tasks in which the word(s) was used as input vectors. Additionally, BERT is outlined to do multi-task learning using context-related datasets. It can perform different NLP tasks simultaneously. This survey focuses on the detailed representation of the BERT- based technique for word embedding, its architecture, and the importance of this model for pre-training purposes using a large corpus.
Author(s) Name:  Rohit Kumar Kaliyar
Conferrence name:  10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Publisher name:  IEEE
Paper Link:   https://ieeexplore.ieee.org/abstract/document/9058044