In machine learning, data collection largely consists of data acquisition, data labeling, and improvement of existing data or models. Data Acquisition: Data acquisition aims to find datasets that can be used to train machine learning models. Data discovery, data augmentation, and data generation are the three methods of data acquisition. Data Labeling: After enough data has been acquired, the next step is to label individual examples. Use existing labels, crowd-based labels, weak labels are the categories of data labeling in machine learning.
Improve Existing Data or Model: An alternative approach to acquiring new data and labeling it is to improve the labeling of any existing datasets or improve the model training. In machine learning, the major problem is that data can be noisy and have incorrect labels. In addition to improving the data, there are also ways to improve the model training itself. Making the model training more robust against noise or bias and another way is to use transfer learning based on the previously-trained models are used as a starting point to train the current mode.