How to implement hierarchical clustering in R?


To implement hierarchical clustering in R programming.

Hierarchical clustering:
  • It is a type of Unsupervised learning.
  • One of the clustering algorithm
  • To draw inference from the unlabeled data
  Type of Hierarchical clustering :
  • Agglomerative Nesting(AGNES) clustering
  • Divise Analysis(DIANA) clustering
  AGNES Hierarchical Clustering:
  • It works in a bottom-up manner.
  • That is, each object is initially considered as a single-element cluster (leaf).
  • At each step of the algorithm the two clusters that are the most similar arecombined into a new bigger cluster (nodes).
  • This procedure is iterated until all points are member of just one single big cluster (root).
  • The result is a tree which can be plotted as a dendrogram.
DIANA Hierarchical Clustering:
  • It works in a top-down manner.
  • The algorithm is an inverse order of AGNES.
  • It begins with the root, in which all objects are included in a single cluster.
  • At each step of iteration, the most heterogeneous cluster is divided into two.
  • The process is iterated until all objects are in their own cluster.
Package and Functions:
  • R Package : cluster--For Clustering algorithms
  • R Package :factoextra--For data manipulation and visualization.
  • R Package :stats--For hclust function
  • R Package : NbClust--For Visualizing the number of clusters.
  • R Package :purrr--for data manipulation(part of tidyverse collection)
  • R Function :sum( return the number of missing values.
  • R Function : dist(x,method= )--to find the dissimilarity matrix
  • R Function :hclust(dist, method= )--from stats package for agglomerative HC
  • R Function :agnes(x,method= )--from cluster package for agglomerative HC
  • R Function :diana(x)--from cluster package for diana HC
  • R Function : mab_dbl()--The map function transform their input by applying a function to each element
  • $ac -agglomerative coeeficient (Values closer to 1 suggest strong clustering structure)
  • $dc -divise coeeficient (Values closer to 1 suggest strong clustering structure)
  • R Function : fviz_nbclust(x,FUNcluster,method=c(silhoutte, wss,gap_stat))-from factoextra package used to compute three different methods(silhoutte,elbow,gap statistic) for any partitioning clustering methods (k-means,k-mediods,HCUT)
  • R Function :hclust(hier_hclust,k= )-to dram dendrogram with vorder around clusters
  • x - data set
  • k - number of clusters
Sapmle Code

#Hierarchical clustering Sample


#Loading required packages


library(“cluster”) #Clustering algorithms

library(“factoextra”) #Clustering Visualization

library(“stats”) #for hclust function

library(“NbClust”) #Clustering and Visualization







#Data Preparation


#Missing values




#Dissimilarity matrix


hier_dist<-dist(input,method = “euclidean”)


#Agglomerative(AGNES) clustering


#hclust function


hier<-hclust(hier_dist,method = “complete”)



#agnes function using method complete


hier_agnes<-agnes(input,method = “complete”)



#Finding the more appropriate method for more strongest clustering structure








agnes(input,method = x)$ac





#agnes function using ward method


h_agnes<-agnes(input,method = “ward”)

pltree(h_agnes,main=”Dendrogram of Agnes”)


#diana method






pltree(h_diana,main = “Dendrogram of Diana”)


#Optimal number of cluster


fviz_nbclust(input,FUN = hcut, method = “silhouette”)


#Dendrogram with border around two clusters


hier<-hclust(hier_dist,method = “complete”)


rect.hclust(hier,k=2,border = 2:6)

