Research breakthrough possible @S-Logix pro@slogix.in

Office Address

Social List

How to implement knn algorithm in R?

Description

To implement knn algorithm for wine dataset using R.

Process

What is knn algorithm?

    • It is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors
    • It is a supervised learning algorithm used for both classification and regression

Data set :

    • wine – downloaded data set

Steps:

    • Import data
    • Data preparation – Missing values, scaling
    • Splitting data set into train and test data set
    • Training the model on the train data set
    • Evaluate model performance

R Package:

    • caret – Used for classification and regression trees

R Function :

    • str(data frame) – To display the internal structure of an data frame
    • summary(data frame) – Displays Minimum, Maximum, Mean, Median, 1st quartile, 2nd quartile values of each numeric, integer like R object in the data frame. And displays count for factor like R object.
    • createDataPartition(y, p= ) – To partition the data set into train and test

y – dependent variable

p – Percentage of data for training

    • trainControl(method=, ) – To select Optimal tuning parameter

method – resampling method

number – Either the number of folds or number of resampling iterations repeats

repeats – For repeated k-fold cross-validation only: the number of complete sets of folds to compute.

    • train(formula, data=, trControl= ) – To fit predictive models for different tuning parameter

Data – data frame

    • predict(model=, data=) – to predict an outcome value on the basis of one or multiple predictor variables

model – fitted model

data – New data

  • confusionMatrix(predicted_value=, y_test=) – To implement Confusion Matrix
Sapmle Code

#Loading Required packages

library(“caret”)

#Import Data

my_input<-read.csv(“wine.csv”)
View(my_input)

str(my_input)
summary(my_input)

#Data preparation

sum(is.na(my_input))

#Splitting into train and tset data set

training<-createDataPartition( my_input$class,p=0.7,list=F)

train<-my_input[training,]
test<-my_input[-training,]

dim(train)
dim(test)

#Converting the class variables as factor in train data set

train$class<-as.factor(train$class)
test$class<-as.factor(test$class)
str(my_input)

#Training the knn model

tr<-trainControl(method = “repeatedcv”,number = 10,repeats = 3)
knn_model<-train(class~., data=train,method=”knn”, trControl=tr,
preProcess=c(“center”,”scale”),
tuneLength=10)
knn_model

#Plotting knn Model

plot(knn_model)

#Prediction

predict_knn<-predict(knn_model, test)
predict_knn

#Confusion Matrix

levels(train$class)

levels(test$class)

confusion_knn<-confusionMatrix(predict_knn,test$class)
confusion_knn

Screenshots
implement knn algorithm in R
library
Import data
Data preparation
Training the knn model
display the internal structure of an data frame
Splitting data set into train and test data set
Prediction