Survey of Clustering in Machine Learning and Deep Learning

   Clustering, an unsupervised machine learning technique, has attracted extensive attention and motivated many applications in this dynamic world. Clustering plays a significant role in data analysis, and its performance highly depends on the quality of data representation. Cluster analysis aims to segment data and find pattern information through the process of assembling the group of objects into a cluster based on their identical characteristics.

   In the clustering process, the objects in a single cluster have more similarity, and the objects of two clusters have dissimilar objects. It is the subject of active research in many fields such as computer science, data science, statistics, pattern recognition, artificial intelligence, and machine learning. There are different clustering algorithms involving partitioning-based clustering algorithms, hierarchical clustering algorithms, density-based clustering algorithms, model based clustering algorithms, and grid-based clustering algorithms.

   Due to the inherent property of highly non-linear transformation, machine learning and deep neural networks (DNNs) can be used to transform the data into more clustering-friendly representations. Deep clustering belongs to clustering methods that adopt deep neural networks to learn clustering-friendly representations via deep clustering algorithms such as AE-based deep clustering, CDNN-based deep clustering, GAN-based deep clustering, and VAE-based deep clustering.

   Numerous literature reviews and analyses discussed the overall scope of deep clustering algorithms. Though there are some significant challenges in the clustering technique, clustering mixed data are challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. Moreover, other issues are data availability, scalability of algorithms, big data challenges, and interpretability.