Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

How to implement hierarchical clustering in R?

Description

To implement hierarchical clustering in R programming.

Process
Hierarchical clustering:
  • It is a type of Unsupervised learning.
  • One of the clustering algorithm
  • To draw inference from the unlabeled data
  Type of Hierarchical clustering :
  • Agglomerative Nesting(AGNES) clustering
  • Divise Analysis(DIANA) clustering
  AGNES Hierarchical Clustering:
  • It works in a bottom-up manner.
  • That is, each object is initially considered as a single-element cluster (leaf).
  • At each step of the algorithm the two clusters that are the most similar arecombined into a new bigger cluster (nodes).
  • This procedure is iterated until all points are member of just one single big cluster (root).
  • The result is a tree which can be plotted as a dendrogram.
DIANA Hierarchical Clustefviz_nbclust (input[,3:4],kmeans,method = wss) +ring:
  • It works in a top-down manner.
  • The algorithm is an inverse order of AGNES.
  • It begins with the root, in which all objects are included in a single cluster.
  • At each step of iteration, the most heterogeneous cluster is divided into two.
  • The process is iterated until all objects are in their own cluster.
Package and Functions:
  • R Package : cluster--For Clustering algorithms
  • R Package :factoextra--For data manipulation and visualization.
  • R Package :stats--For hclust function
  • R Package : NbClust--For Visualizing the number of clusters.
  • R Package :purrr--for data manipulation(part of tidyverse collection)
  • R Function :sum(is.na(data))--to return the number of missing values.
  • R Function : dist(x,method= )--to find the dissimilarity matrix
  • R Function :hclust(dist, method= )--from stats package for agglomerative HC
  • R Function :agnes(x,method= )--from cluster package for agglomerative HC
  • R Function :diana(x)--from cluster package for diana HC
  • R Function : mab_dbl()--The map function transform their input by applying a function to each element
  • $ac -agglomerative coeeficient (Values closer to 1 suggest strong clustering structure)
  • $dc -divise coeeficient (Values closer to 1 suggest strong clustering structure)
  • R Function : fviz_nbclust(x,FUNcluster,method=c(silhoutte, wss,gap_stat))-from factoextra package used to compute three different methods(silhoutte,elbow,gap statistic) for any partitioning clustering methods (k-means,k-mediods,HCUT)
  • R Function :hclust(hier_hclust,k= )-to dram dendrogram with vorder around clusters
  • x - data set
  • k - number of clusters
Sapmle Code

#Hierarchical clustering Sample

 

#Loading required packages

 

library(“cluster”) #Clustering algorithms

library(“factoextra”) #Clustering Visualization

library(“stats”) #for hclust function

library(“NbClust”) #Clustering and Visualization

 

#Input

 

input<-iris

View(input)

 

#Data Preparation

 

#Missing values

 

sum(is.na(input))

 

#Dissimilarity matrix

 

hier_dist<-dist(input,method = “euclidean”)

 

#Agglomerative(AGNES) clustering

 

#hclust function

 

hier<-hclust(hier_dist,method = “complete”)

plot(hier,cex=0.6)

 

#agnes function using method complete

 

hier_agnes<-agnes(input,method = “complete”)

hier_agnes$ac

 

#Finding the more appropriate method for more strongest clustering structure

 

#install.packages(“purrr”)

library(“purrr”)

 

m<-c(“complete”,”single”,”ward”,”average”)

names(m)<-c(“complete”,”single”,”ward”,”average”)

agg_coef<-function(x){

agnes(input,method = x)$ac

}

 

map_dbl(m,agg_coef)

 

#agnes function using ward method

 

h_agnes<-agnes(input,method = “ward”)

pltree(h_agnes,main=”Dendrogram of Agnes”)

 

#diana method

 

h_diana<-diana(input)

 

h_diana$dc

 

pltree(h_diana,main = “Dendrogram of Diana”)

 

#Optimal number of cluster

 

fviz_nbclust(input,FUN = hcut, method = “silhouette”)

 

#Dendrogram with border around two clusters

 

hier<-hclust(hier_dist,method = “complete”)

plot(hier,cex=0.6)

rect.hclust(hier,k=2,border = 2:6)

Screenshots
implement hierarchical clustering in R
Loading required packages
for hclust function
View input
Dissimilarity matrix
hclust function
Finding the more appropriate method for more strongest clustering structure
It works in a top-down manner
Hierarchical clustering Sample