Natural language processing(NLP) is a common application that utilizes data augmentation. Text data augmentation is specifically applied in most of the NLP tasks. Text data augmentation is the process of applying data augmentation to text data. In text data augmentation, the artificial and additional text is generated using synonyms or similar words related to existing textual data.
Significance of text data enhancement in NLP improves the accuracy and robustness of the models on existing data for large and small amounts of data. Textual data augmentation in NLP also poses difficulty due to the maintenance of label quality when it automatically executes the new textual data since several approaches for text data augmentation are discovered. Text data augmentation prevents over-fitting through regularization.
Text data augmentation mechanisms are divided into symbolic augmentations and neural augmentations. Symbolic augmentation utilizes short transformations such as replacing words or phrases swapping of words to generate augmented textual data, and it is understandable to human designers. Techniques in symbolic augmentation are rule-based augmentation, graph-structured augmentation, mixup augmentation, and feature space augmentation. Symbolic augmentation is limited in global transformations. Neural augmentation relies on generating new training text data using deep neural networks.
Deep learning model in data augmentation offers generalization ability toward the models. Neural augmentation-based techniques are back translation augmentation, style augmentation, and generative data augmentation. The application area of text data augmentation are industries, communication, and finance. Upcoming advancements in text data augmentations are the combination of augmentation techniques for robust fine-tuning, large-scale language models for text augmentation, and text augmentation for language models with high error recognition.