How to implement Multilayer Perceptron for classification in Spark with R using sparkR?

Description

To implement multilayer perceptron for classification in Spark with R using sparklyR

Functions used :

spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
sdf_partition(spark_dataframe,partitions with weights,seed) – To partition spark dataframe into multiple groups
ml_multilayer_perceptron_classifier(train_data,formula,layers=c(no.of.i/p layers,no.of hidden layers,no.of output layers))
ml_predict(ml_model,test_data) – To predict the response for the test data
ml_multiclass_classification_evaluator(predict, label_col = “label”,prediction_col = “prediction”) – To evaluate the metrics(f1(default),accuracy,weighted precision,weighted recall)
  • Load the sparklyr library
  • Create a spark connection
  • Copy data to spark environment
  • Convert the categorical input column to integer
  • Split the data for train and test
  • Build the Multi layer perceptron model
  • Predict using the test data
  • Evaluate the metrics

#Load the sparklyr library
library(sparklyr)
#Create a spark connection
sc #Copy data to spark environment
data_s=sdf_copy_to(sc,read.csv(“/home/soft23/soft23/cardata.csv”),”car”,overwrite= TRUE)
#Convert the categorical input and character input to integer
data_s=ft_string_indexer(data_s,input_col = “buying”,output_col=”buying_ind”)
data_s=ft_string_indexer(data_s,input_col = “maint”,output_col=”maint_ind”)
data_s=ft_string_indexer(data_s,input_col = “lug_boot”,output_col=”lug_boot_ind”)
data_s=ft_string_indexer(data_s,input_col = “safety”,output_col=”safety_ind”)
data_s=ft_string_indexer(data_s,input_col = “doors”,output_col=”doors_ind”)
data_s=ft_string_indexer(data_s,input_col = “persons”,output_col=”persons_ind”)
#Split the data for training and testing
partitions=sdf_partition(data_s,training=0.7,test=0.3,seed=111)
train_data=partitions$training
test_data=partitions$test
#Build the mlp model
mlp_model summary(mlp_model)
#Predict using the test data
predictions = ml_predict(mlp_model, test_data)
predictions
#Evaluate the metrics AUC
cat(“\nF1 : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”))
cat(“\nAccuracy : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”,metric_name=”accuracy”))
cat(“\nPrecision : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”,metric_name=”weightedPrecision”))
cat(“\nRecall : “,ml_multiclass_classification_evaluator(predictions, label_col = “label”,prediction_col = “prediction”,metric_name=”weightedRecall”))

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit