Research Area:  Machine Learning
Semi-supervised clustering is one of important research topics in cluster analysis, which uses pre-given knowledge as constraints to improve the clustering performance. While clustering a data set, people often get prior constraints from different information sources, which may have different representations and contents, to guide clustering process. However, most of existing semi-supervised clustering algorithms are based on single-source constraints and rarely consider to integrate multi-source constraints to enhance the clustering quality. To solve the problem, we analyze the relations among different types of constraints and propose an uniform representation for them. Based it, we propose a new semi-supervised clustering algorithm to find out a clustering that has good cluster structure and high consensus of all the sources of constraints. In the algorithm, we construct an optimization objective model and its solution method to achieve the aim. This algorithm can integrate multi-source constraints well to reduce the effect of incorrect constraints from single sources and find out a high-quality clustering. By the experimental studies on several benchmark data sets, we illustrate the effectiveness of the proposed algorithm, compared to other semi-supervised clustering algorithms.
Keywords:  
Clustering algorithms
Matrix converters
Machine learning algorithms
Optimization
Benchmark testing
Task analysis
Measurement
Author(s) Name:  Liang Bai; JiYe Liang; Fuyuan Cao
Journal name:  IEEE Transactions on Pattern Analysis and Machine Intelligence
Conferrence name:  
Publisher name:  IEEE
DOI:  10.1109/TPAMI.2020.2979699
Volume Information:  Volume: 43
Paper Link:   https://ieeexplore.ieee.org/abstract/document/9031553