R sample code for kmeans clustering implementation in the given data

How to implement kmeans clustering for the given data in R?

Description

To implement the kmeans clustering for the given data set using R programming.

Process

What is Unsupervised ML?

In unsupervised learning all the data are unlabelled.

It have input data (x) and no corresponding output (y) variable.

Its goal is to model the underlying structure of the data and to distribute the data.

In order to learn more about the data.

It is grouped into Clustering and Association.

Clustering:

To discover groupings in the dataset.

Grouping customers by purchasing behavior.

Centroid based Algorithms : K-means,hierarchical algorithm.

K-means Clustering algorithm:

It is a type of Unsupervised learning.

One of the clustering algorithm

It is a centroid based algorithm

To find Patterns in the data.

Steps in K-means Clustering:

Step 1: Import data.

Step 2: Data Preparation

Step 3: Compute kmeans

Step 4: Ploting the result.

Step 5: Finding the optimal number of clusters.

Package and Functions:

R Package :ggplot2--For Visualization

R Package : animation--For animatic visualization

R Package :factoextra--For data manipulation and visualization.

R Package : NbClust--For Visualizing the number of clusters.

R Function : sum(is.na(data))--to return the number of missing values.

R Function : kmeans(data,centers=,nstart= )-- to compute kmeans.

data--data set to be clustered

centers--No.of Clusters to be formed

nstart --No.of random sets to be chosen initially.

R Function : fviz_nbclust(x,FUNcluster method=c(silhoutte,wss,gap_stat))-from factoextra package used to compute three different methods(silhoutte,elbow,gap statistic) for any partitioning clustering methods(k-means,k-mediods,HCUT)

R Function : par(mfrow=c())--to split the graph screen

Sapmle Code

#Clustering

#Input Data set

input View(input)

#Data Preparation

#Missing Values

sum(is.na(input))

#ggplot

#install.packages(“ggplot2”)

library(“ggplot2”)

ggplot(input, aes(x=Petal.Length,y=Petal.Width,color

=Species)) +geom_point()

#kmeans

kmeans(input[,3:4],centers = 3,nstart = 25)

#install.packages(“animation”)

library(“animation”)

par(mfrow=c(3,3))

kmeans.ani(input[,3:4],3)

#Finding optimal k

km print(km)

km$tot.withinss

#Elbow method

#install.packages(“factoextra”)

library(“factoextra”)

#install.packages(“NbClust”)

library(“NbClust”)

fviz_nbclust(input[,3:4],kmeans,method = “wss”) + geom_vline(xintercept = 3,linetype=5) + labs(subtitle=”Elbow method”)

Screenshots

List

Office Address

Social List

How to implement kmeans clustering for the given data in R?

Description

Process

Sapmle Code

Screenshots

S-Logix (OPC) Private Limited