Text Augmentation

Research Area: Machine Learning

Abstract:

This study considers the problem of using small text datasets for learning of neural networks. We explore the method used for image and sound datasets that augments data in order to increase the performance of models trained on it. We propose a method for augmenting that is based on synonymy. Natural language processing is an actively developing area today. Machine learning develops in this direction, and developers need for their approaches a lot of labeled data, but it costs a lot of hours of human’s work. So, there is a need for increasing amount of data which was labeled earlier. These methods already exist in other parts of machine learning such as image classification, speech, and sound recognition, but all technologies that can be used for images and sounds are not suitable for text because of the danger of losing the sense of a sentence. These methods are named data augmentation and they are a common way to increase the performance of the model, avoid overfitting and improve the models robustness. In this paper, we suggest a method for text augmentation that can improve the performance, does not very computationally cost and allows not losing the sense of the sentence. The paper consists of Related Work where we present some augmentation technologies, Methods where the model is described, Dataset provides the information about data for experiments and Experiments and Results represent the results of our work. In Conclusion part, some future goals are outlined.

Keywords:
Text Augmentation
Neural Networks
Machine Learning
Deep Learning

Author(s) Name: Anna V. Mosolova, Vadim V. Fomin, and Ivan Yu. Bondarenko

Journal name: National Research University Higher School of Economics

Conferrence name:

Publisher name: CEUR-WS

DOI:

Volume Information: Vol-2268

Paper Link: http://ceur-ws.org/Vol-2268/paper11.pdf

Office Address

Social List