Research Area:  Machine Learning
Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.
Keywords:  
Author(s) Name:  Faiza Khan Khattaka,Serena Jeblee, ChloéPou-Prom,Mohamed Abdalla,Christopher Meaney,Frank Rudzicz
Journal name:  Journal of Biomedical Informatics
Conferrence name:  
Publisher name:  Elsevier B.V.
DOI:  https://doi.org/10.1016/j.yjbinx.2019.100057
Volume Information:  Volume 100, Supplement, 2019, 100057
Paper Link:   https://www.sciencedirect.com/science/article/pii/S2590177X19300563