• #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
• pro@slogix.in
• +91- 81240 01111

Social List

How to reduce the dimension of a given data set using Principal component Analysis and build a machine learning model
Description

To reduce the dimension of a given data set using principal component analysis and build a machine learning model

Libraries required :

require(MASS)

library(caret)

library(naivebayes)
library(AUC)

Functions used :

qda(formula,data) – To compute the Quadratic discriminant analysis

Data set :

Iris data set

Process

Compute the Principal Component Analysis

Create a data frame with desired number of principal component(This can be done by Analyzing the summary of the PCA(Proportion of variance or Cumulative proportion))

Split this data frame for train and test

Build the model using train and predict using the test data

Compute the confusion Matrix

Create another data frame for the original data

Split this data frame for train and test

Build the model using train and predict using the test data

Compute the confusion matrix

Compare the confusion matrix obtained from the two model and interpret the result

Sapmle Code

require(MASS)
library(caret)
library(naivebayes)
library(AUC)
#To Split 80% of data as training data
smp_size train_ind #Compute the Principal component Analysis
pca_HAR summary(pca_HAR)
##Create the dataframe with PCA component and y
final_data=data.frame(x=pca_HAR\$x[,1:70],y=as.factor(y\$V1))
##Split the PCA dataframe for train and test
train test #Build the naive bayes model using the PCA component
nb #Predict using the test data of PCA component
pred=predict(nb,test)
#Compute the confusion matrix
cat(“The confusion matrix for data with PCA component is \n”)
print(confusionMatrix(pred,test\$y))
#Create data frame with x and y original data
##Here X and y are in different file so we are creating a data frame to combine them
##In case if the x and y are in single file this step can be skipped
final_data1=data.frame(x=data,y=as.factor(y\$V1))
#Split the data for train and test
train1 test1 #Build the naive bayes model using the original train data
nb1 #Predict using the original test data
pred1=predict(nb1,test1)
#Compute the confusion matrix
cat(“The confusion matrix for data with original data is \n”)
print(confusionMatrix(pred1,test1\$y))
#To interpret the result
plot(accuracy(pred,test\$y), type = “l”,main=”Accuracy when PCA is used”)
plot(accuracy(pred1,test1\$y), type = “l”,main=”Accuracy when original data is used”)
plot(roc(pred,test\$y), type = “l”,main=”ROC when PCA is used”)
plot(roc(pred1,test1\$y), type = “l”,main=”ROC when Original data is used”)