Multilayer Perceptron classification source code in R Spark

How to implement Multilayer Perceptron for classification in Spark with R using sparkR?

Description

To implement multilayer perceptron for classification in Spark with R using sparklyR

Functions used :

spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
sdf_partition(spark_dataframe,partitions with weights,seed) – To partition spark dataframe into multiple groups
ml_multilayer_perceptron_classifier(train_data,formula,layers=c(no.of.i/p layers,no.of hidden layers,no.of output layers))
ml_predict(ml_model,test_data) – To predict the response for the test data
ml_multiclass_classification_evaluator(predict, label_col = “label”,prediction_col = “prediction”) – To evaluate the metrics(f1(default),accuracy,weighted precision,weighted recall)

Process

Load the sparklyr library
Create a spark connection
Copy data to spark environment
Convert the categorical input column to integer
Split the data for train and test
Build the Multi layer perceptron model
Predict using the test data
Evaluate the metrics

Sapmle Code

#Load the sparklyr library
library(sparklyr)
#Create a spark connection
sc #Copy data to spark environment
data_s=sdf_copy_to(sc,read.csv(“/home/soft23/soft23/cardata.csv”),”car”,overwrite= TRUE)
#Convert the categorical input and character input to integer
data_s=ft_string_indexer(data_s,input_col = “buying”,output_col=”buying_ind”)
data_s=ft_string_indexer(data_s,input_col = “maint”,output_col=”maint_ind”)
data_s=ft_string_indexer(data_s,input_col = “lug_boot”,output_col=”lug_boot_ind”)
data_s=ft_string_indexer(data_s,input_col = “safety”,output_col=”safety_ind”)
data_s=ft_string_indexer(data_s,input_col = “doors”,output_col=”doors_ind”)
data_s=ft_string_indexer(data_s,input_col = “persons”,output_col=”persons_ind”)
#Split the data for training and testing
partitions=sdf_partition(data_s,training=0.7,test=0.3,seed=111)
train_data=partitions$training
test_data=partitions$test
#Build the mlp model
mlp_model summary(mlp_model)
#Predict using the test data
predictions = ml_predict(mlp_model, test_data)
predictions
#Evaluate the metrics AUC
cat(“\nF1 : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”))
cat(“\nAccuracy : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”,metric_name=”accuracy”))
cat(“\nPrecision : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”,metric_name=”weightedPrecision”))
cat(“\nRecall : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”,metric_name=”weightedRecall”))

Screenshots

List

Office Address