Implement naive bayes for classification using Spark with R

How to implement naive bayes for classification using Spark with R?

Description

To implement naive bayes for classification using Spark with R

Process

Set up Spark Context and Spark session
Load the Data set
Split the data into train and test set
Fit the naive bayes model for classification
Take the summary of the model
Predict using the test set
Evaluate the metrics

Sapmle Code

#Set up spark home
Sys.setenv(SPARK_HOME=”/…./spark-2.4.0-bin-hadoop2.7″)
.libPaths(c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”), .libPaths()))
#Load the library
library(SparkR)
#Initialize the Spark Context
#To run spark in a local node give master=”local”
sc #Start the SparkSQL Context
sqlContext #Load the data set
data = read.df(“file:///…../car.data”,”csv”,header = “true”, inferSchema = “true”, na.strings = “NA”)
#Split the data into train and test set
splt_data=randomSplit(data,c(7,3),42)
trainingData=splt_data[[1]]
testData=splt_data[[2]]
coln=columns(data)
xtest=select(testData,coln[1:6])
ytest=select(testData,”Class”)
#Build the model
nb summary(nb)
#Predict using the test data
pred=predict(nb,xtest)
showDF(pred,10)
#Convert the spark data frame to R data frame
y_pred=collect(select(pred,”prediction”),stringsAsFactors=FALSE)
y_true=collect(select(ytest,”Class”),stringsAsFactors=FALSE)
#Calculate the confusion matrix
conf_mat=confusionMatrix(as.factor(y_pred$prediction),as.factor(y_true$Class))
print(conf_mat)

Screenshots

List

Office Address

Social List

How to implement naive bayes for classification using Spark with R?

Description

Process

Sapmle Code

Screenshots

S-Logix (OPC) Private Limited