How to find optimal number of clusters using NbClust package in R?

Description

To find the optimal number of clusters using NbClust package in R.

    • R Package : factoextra– For data
      manipulation and visualization.
    • R Package : NbClust– For Visualizing
      the number of clusters
    • R Function : fviz_nbclust(x,FUNcluster,
      method=c(“silhoutte”,”wss”,”gap_stat”))
      from factoextra package used to compute

three different methods(silhoutte,
elbow,gap statistic) for any partitioning
clustering methods(k-means,k-mediods,
HCUT)

    • R Function : NbClust(data, min.nc= ,
      max.nc= ,method= ,distance= ) –
      – from NbClust package can simultaneously

computes all the indices and determine the
number of clusters in a single function call.

  • nc-minimum number of clusters
  • nc -maximum number if clusters
  • method- kmeans- for kmeans clustering

“ward.D”, “ward.D2”, “single”, “complete”,
“average” – for hierarchical clustering

  • distance —  “euclidean”, “manhattan”
    or “NULL”.

#Optimal Number of Clusters

#Loading required packages
#install.packages(“factoextra”)
library(“factoextra”)
#install.packages(“NbClust”)
library(“NbClust”)

#Input dataset
input View(input)

#Elbow method
fviz_nbclust(input[,3:4],kmeans,method = “wss”) +
geom_vline(xintercept = 3,linetype=2) +
labs(subtitle=”Elbow Method”)

#Silhouette Method
fviz_nbclust(input[,3:4],kmeans,method = “silhouette”) +
labs(subtitle=”Silhouette Method”)

#Gap Statistic method
fviz_nbclust(input[,3:4],kmeans,method = “gap_stat”)
labs(subtitle=”Gap Statistic Method”)

#Nbclust() function to find optimal no of clusters using 30 methods at a time
NbClust(input[,3:4], min.nc = 2, max.nc = 10,method = “kmeans”, distance = “euclidean”)

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit