Research Area:  Machine Learning
This study considers the problem of using small text datasets for learning of neural networks. We explore the method used for image and sound datasets that augments data in order to increase the performance of models trained on it. We propose a method for augmenting that is based on synonymy. Natural language processing is an actively developing area today. Machine learning develops in this direction, and developers need for their approaches a lot of labeled data, but it costs a lot of hours of human’s work. So, there is a need for increasing amount of data which was labeled earlier. These methods already exist in other parts of machine learning such as image classification, speech, and sound recognition, but all technologies that can be used for images and sounds are not suitable for text because of the danger of losing the sense of a sentence. These methods are named data augmentation and they are a common way to increase the performance of the model, avoid overfitting and improve the models robustness. In this paper, we suggest a method for text augmentation that can improve the performance, does not very computationally cost and allows not losing the sense of the sentence. The paper consists of Related Work where we present some augmentation technologies, Methods where the model is described, Dataset provides the information about data for experiments and Experiments and Results represent the results of our work. In Conclusion part, some future goals are outlined.
Keywords:  
Text Augmentation
Neural Networks
Machine Learning
Deep Learning
Author(s) Name:  Anna V. Mosolova, Vadim V. Fomin, and Ivan Yu. Bondarenko
Journal name:  National Research University Higher School of Economics
Conferrence name:  
Publisher name:  CEUR-WS
DOI:  
Volume Information:  Vol-2268
Paper Link:   http://ceur-ws.org/Vol-2268/paper11.pdf