Research Area:  Machine Learning
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task once, then share it across disparate NLP tasks. Language modeling serves as the generic task, one with abundant self-supervised text available for extensive training. This article presents the key fundamental concepts of PLM architectures and a comprehensive view of the shift to PLM-driven NLP techniques. It surveys work applying the pre-training then fine-tuning, prompting, and text generation approaches. In addition, it discusses PLM limitations and suggested directions for future research.
Keywords:  
Pre-trained language models
BERT
GPT
NLP tasks
PLMs
Author(s) Name:  Bonan Min , Hayley Ross , Elior Sulem , Amir Pouran Ben Veyseh , Thien Huu Nguyen
Journal name:  ACM Computing Surveys
Conferrence name:  
Publisher name:  ACM Library
DOI:  10.1145/3605943
Volume Information:  Volume 56
Paper Link:   https://dl.acm.org/doi/abs/10.1145/3605943