Hierarchical clustering implementation in R Machine Learning

Process

Hierarchical clustering:

It is a type of Unsupervised learning.
One of the clustering algorithm
To draw inference from the unlabeled data

Type of Hierarchical clustering :

Agglomerative Nesting(AGNES) clustering
Divise Analysis(DIANA) clustering

AGNES Hierarchical Clustering:

It works in a bottom-up manner.
That is, each object is initially considered as a single-element cluster (leaf).
At each step of the algorithm the two clusters that are the most similar arecombined into a new bigger cluster (nodes).
This procedure is iterated until all points are member of just one single big cluster (root).
The result is a tree which can be plotted as a dendrogram.

DIANA Hierarchical Clustefviz_nbclust (input[,3:4],kmeans,method = wss) +ring:

It works in a top-down manner.
The algorithm is an inverse order of AGNES.

It begins with the root, in which all objects are included in a single cluster.

At each step of iteration, the most heterogeneous cluster is divided into two.

The process is iterated until all objects are in their own cluster.

Package and Functions:

R Package : cluster--For Clustering algorithms
R Package :factoextra--For data manipulation and visualization.
R Package :stats--For hclust function
R Package : NbClust--For Visualizing the number of clusters.
R Package :purrr--for data manipulation(part of tidyverse collection)
R Function :sum(is.na(data))--to return the number of missing values.
R Function : dist(x,method= )--to find the dissimilarity matrix
R Function :hclust(dist, method= )--from stats package for agglomerative HC
R Function :agnes(x,method= )--from cluster package for agglomerative HC
R Function :diana(x)--from cluster package for diana HC
R Function : mab_dbl()--The map function transform their input by applying a function to each element
$ac -agglomerative coeeficient (Values closer to 1 suggest strong clustering structure)
$dc -divise coeeficient (Values closer to 1 suggest strong clustering structure)
R Function : fviz_nbclust(x,FUNcluster,method=c(silhoutte, wss,gap_stat))-from factoextra package used to compute three different methods(silhoutte,elbow,gap statistic) for any partitioning clustering methods (k-means,k-mediods,HCUT)
R Function :hclust(hier_hclust,k= )-to dram dendrogram with vorder around clusters
x - data set
k - number of clusters

Sapmle Code

#Hierarchical clustering Sample

#Loading required packages

library(“cluster”) #Clustering algorithms

library(“factoextra”) #Clustering Visualization

library(“stats”) #for hclust function

library(“NbClust”) #Clustering and Visualization

#Input

input<-iris

View(input)

#Data Preparation

#Missing values

sum(is.na(input))

#Dissimilarity matrix

hier_dist<-dist(input,method = “euclidean”)

#Agglomerative(AGNES) clustering

#hclust function

hier<-hclust(hier_dist,method = “complete”)

plot(hier,cex=0.6)

#agnes function using method complete

hier_agnes<-agnes(input,method = “complete”)

hier_agnes$ac

#Finding the more appropriate method for more strongest clustering structure

#install.packages(“purrr”)

library(“purrr”)

m<-c(“complete”,”single”,”ward”,”average”)

names(m)<-c(“complete”,”single”,”ward”,”average”)

agg_coef<-function(x){

agnes(input,method = x)$ac

}

map_dbl(m,agg_coef)

#agnes function using ward method

h_agnes<-agnes(input,method = “ward”)

pltree(h_agnes,main=”Dendrogram of Agnes”)

#diana method

h_diana<-diana(input)

h_diana$dc

pltree(h_diana,main = “Dendrogram of Diana”)

#Optimal number of cluster

fviz_nbclust(input,FUN = hcut, method = “silhouette”)

#Dendrogram with border around two clusters

hier<-hclust(hier_dist,method = “complete”)

plot(hier,cex=0.6)

rect.hclust(hier,k=2,border = 2:6)

Office Address

Social List

How to implement hierarchical clustering in R?

Description

Process

Sapmle Code

Screenshots

S-Logix (OPC) Private Limited