• 2nd Floor, #7a, High School Road, Secretariat Colony Ambattur, Chennai-600053 (Landmark: SRM School) Tamil Nadu, India
• pro@slogix.in
• +91- 81240 01111

### How to implement hierarchical clustering in R?

###### Description

To implement hierarchical clustering in R programming.

###### Process
Hierarchical clustering:
• It is a type of Unsupervised learning.
• One of the clustering algorithm
• To draw inference from the unlabeled data
Type of Hierarchical clustering :
• Agglomerative Nesting(AGNES) clustering
• Divise Analysis(DIANA) clustering
AGNES Hierarchical Clustering:
• It works in a bottom-up manner.
• That is, each object is initially considered as a single-element cluster (leaf).
• At each step of the algorithm the two clusters that are the most similar arecombined into a new bigger cluster (nodes).
• This procedure is iterated until all points are member of just one single big cluster (root).
• The result is a tree which can be plotted as a dendrogram.
DIANA Hierarchical Clustefviz_nbclust (input[,3:4],kmeans,method = wss) +ring:
• It works in a top-down manner.
• The algorithm is an inverse order of AGNES.
• It begins with the root, in which all objects are included in a single cluster.
• At each step of iteration, the most heterogeneous cluster is divided into two.
• The process is iterated until all objects are in their own cluster.
Package and Functions:
• R Package : cluster--For Clustering algorithms
• R Package :factoextra--For data manipulation and visualization.
• R Package :stats--For hclust function
• R Package : NbClust--For Visualizing the number of clusters.
• R Package :purrr--for data manipulation(part of tidyverse collection)
• R Function :sum(is.na(data))--to return the number of missing values.
• R Function : dist(x,method= )--to find the dissimilarity matrix
• R Function :hclust(dist, method= )--from stats package for agglomerative HC
• R Function :agnes(x,method= )--from cluster package for agglomerative HC
• R Function :diana(x)--from cluster package for diana HC
• R Function : mab_dbl()--The map function transform their input by applying a function to each element
• \$ac -agglomerative coeeficient (Values closer to 1 suggest strong clustering structure)
• \$dc -divise coeeficient (Values closer to 1 suggest strong clustering structure)
• R Function : fviz_nbclust(x,FUNcluster,method=c(silhoutte, wss,gap_stat))-from factoextra package used to compute three different methods(silhoutte,elbow,gap statistic) for any partitioning clustering methods (k-means,k-mediods,HCUT)
• R Function :hclust(hier_hclust,k= )-to dram dendrogram with vorder around clusters
• x - data set
• k - number of clusters
###### Sapmle Code

#Hierarchical clustering Sample

library(“cluster”) #Clustering algorithms

library(“factoextra”) #Clustering Visualization

library(“stats”) #for hclust function

library(“NbClust”) #Clustering and Visualization

#Input

input<-iris

View(input)

#Data Preparation

#Missing values

sum(is.na(input))

#Dissimilarity matrix

hier_dist<-dist(input,method = “euclidean”)

#Agglomerative(AGNES) clustering

#hclust function

hier<-hclust(hier_dist,method = “complete”)

plot(hier,cex=0.6)

#agnes function using method complete

hier_agnes<-agnes(input,method = “complete”)

hier_agnes\$ac

#Finding the more appropriate method for more strongest clustering structure

#install.packages(“purrr”)

library(“purrr”)

m<-c(“complete”,”single”,”ward”,”average”)

names(m)<-c(“complete”,”single”,”ward”,”average”)

agg_coef<-function(x){

agnes(input,method = x)\$ac

}

map_dbl(m,agg_coef)

#agnes function using ward method

h_agnes<-agnes(input,method = “ward”)

pltree(h_agnes,main=”Dendrogram of Agnes”)

#diana method

h_diana<-diana(input)

h_diana\$dc

pltree(h_diana,main = “Dendrogram of Diana”)

#Optimal number of cluster

fviz_nbclust(input,FUN = hcut, method = “silhouette”)

#Dendrogram with border around two clusters

hier<-hclust(hier_dist,method = “complete”)

plot(hier,cex=0.6)

rect.hclust(hier,k=2,border = 2:6)