Implement code for logistic regression using Spark with R

How to implement logistic regression using Spark with R

Description

To implement logistic regression using Spark with R

Process

Set up Spark Context and Spark session
Load the Data set
Split the data into train and test set
Fit the logistic regression model
Predict using the test set
Take the summary of the model
Evaluate the metrics

Sapmle Code

#Set up spark home
Sys.setenv(SPARK_HOME=”…../spark-2.4.0-bin-hadoop2.7″)
.libPaths(c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”), .libPaths()))
#Load the library
library(SparkR)
library(“caret”)
#Initialize the Spark Context
#To run spark in a local node give master=”local”
sc #Start the SparkSQL Context
sqlContext #Load the data set
data = read.df(“file:///…./weight-height.csv”,”csv”,header = “true”, inferSchema = “true”, na.strings = “NA”)
#Split the data into train and test set
splt_data=randomSplit(data,c(8,2),42)
trainingData=splt_data[[1]]
testData=splt_data[[2]]
xtest=select(testData,”Height”,”Weight”)
ytest=select(testData,”Gender”)
#Build the model
gender_model summary(gender_model)
#Predict using the test data
pred=predict(gender_model,xtest)
showDF(pred)
#Convert the spark data frame to R data frame
y_pred=collect(select(pred,”prediction”),stringsAsFactors=FALSE)
y_true=collect(select(ytest,”Gender”),stringsAsFactors=FALSE)
#Calculate the confusion matrix
confusionMatrix(as.factor(y_pred$prediction),as.factor(y_true$Gender))

Screenshots

List

Office Address

Social List

How to implement logistic regression using Spark with R

Description

Process

Sapmle Code

Screenshots

S-Logix (OPC) Private Limited