Research Area:  Machine Learning
Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).
Author(s) Name:  Markus Bayer, Marc-AndrĂ© Kaufhold, Christian Reuter
Journal name:  Computation and Language
Conferrence name:  
Publisher name:  arXiv:2107.03158
DOI:  https://doi.org/10.48550/arXiv.2107.03158
Volume Information:  
Paper Link:   https://arxiv.org/abs/2107.03158