COLI: Collaborative clustering missing data imputation - 2021

Research Area:  Machine Learning


Missing data imputation plays an important role in the data cleansing process. Clustering algorithms have been widely used for missing data imputation, yet, there is little research done on the use of clustering ensemble for missing data imputation, which aggregates multiple clustering results. This paper proposes a novel collaborative clustering-based imputation method, called COLI, which uses the imputation quality as a key criterion for the exchange of information between different clustering results. To the best of our knowledge, this is the first study on the impact of collaborative clustering on imputation performance. The main contributions of this paper are three-fold. A novel missing value imputation based on collaborative clustering is proposed, three amputation strategies are used to induce missingness on various complete and publicly available datasets with different mechanisms, distributions, and ratios, which allows evaluating the imputation quality of the proposed method in estimating missing values of various numerical datasets with different missingness mechanisms, distributions, and ratios. The proposed method is compared to several state-of-the-art imputation methods and attained results demonstrate that the proposed method is an effective method for handling missing data.

Author(s) Name:  Daoming Wan,Roozbeh Razavi-Far,Mehrdad Saif,Niloofar Mozafari

Journal name:  Pattern Recognition Letters

Conferrence name:  

Publisher name:  Elsevier

DOI:  10.1016/j.patrec.2021.11.011

Volume Information:  Volume 152, December 2021, Pages 420-427