Research Area:  Machine Learning
Several web applications maintain data repositories containing references to thousands of real-world entities originating from multiple sources, and they continually receive new data. Identifying the distinct entities and associating the correct references to each one is a problem known as entity resolution. The challenge is to solve the problem incrementally, as the data arrive, especially when those data are described by a single textual attribute. In this paper, we propose a new approach for incremental entity resolution. The method we have implemented, called AssocIER, uses an ensemble of multiclass classifiers with self-training and detection of novel classes. We have evaluated our method in various real-world datasets and scenarios, comparing it with a traditional entity resolution approach. The results show that AssocIER is effective and efficient to solve unstructured data in collections with a large number of entities and features, and is able to detect hundreds of novel classes.
Author(s) Name:  João Antonio Silva and Denilson Alves Pereira
Journal name:  International Journal of Business Intelligence and Data Mining
Publisher name:  Inderscience
Volume Information:  Vol. 18, No. 2,pp 218-245
Paper Link:   https://www.inderscienceonline.com/doi/abs/10.1504/IJBIDM.2021.112988