Clustering is one of the prominent machine learning techniques, the process of grouping the data points based on the relationship or similarity among the data samples. It is the unsupervised learning model discovering the related data samples from the input feature space without unlabeled data. The need for clustering occurs When there is unlabeled data without proper knowledge of grouping clarity. The main goal of the clustering model is to determine similar patterns from the data samples to identify the group of interesting patterns. The data points inside a cluster are more related than those grouped into the other clusters. Types of clustering are Hard clustering: grouping data points with one data point that belongs to one group, Soft clustering: grouping data points with data points that belong to multiple groups.
The most commonly used clustering algorithms are Partitioning or centroid-based Clustering, Density-Based Clustering, Distribution Model-Based Clustering, Hierarchical Clustering, and Fuzzy Clustering are the methods of clustering. Partitioning clustering, K-means clustering, Mean shift, Agglomerative Clustering, Gaussian mixed models, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Mini-Batch clustering, and Hierarchical clustering. Evaluation metrics of clustering results contain three approaches Internal evaluation, external evaluation, and cluster tendency. The most popular clustering applications are Statistical data analysis, medical imaging, image segmentation, customer segmentation, pattern recognition, anomaly detection, social network analysis, crime analysis, climatology, and natural language processing. Recent advancements of clustering in computer science and statistical physics led to the development of new clustering algorithms.