How to implement decision tree for classification in spark with R using SparklyR package?


To implement decision tree for classification in spark with R using SparklyR package

Functions used :

spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
sdf_partition(spark_dataframe,partitions with weights,seed) – To partition spark dataframe into multiple groups
ml_decision_tree_classifier(train_data,formula) – To build a decision tree model
ml_predict(ml_model,test_data) – To predict the response for the test data
ml_multiclass_classification_evaluator(predict, label_col = “label”,prediction_col = “prediction”) – To evaluate the metrics(f1(default),accuracy,weighted precision,weighted recall)

  • Load the sparklyr package
  • Create a spark connection
  • Copy data to spark environment
  • Split the data for training and testing
  • Build the decision tree model
  • Predict using the test data
  • Evaluate the metrics

#Load the sparklyr library
#Create a spark connection
sc #Copy data to spark environment
data_s=sdf_copy_to(sc,read.csv(“…../”),”car”,overwrite= TRUE)
#Split the data for training and testing
#Build the linear regression model
dec_tree=ml_decision_tree_classifier(x =train_data,Class~.)
#Predict using the test data
prediction = ml_predict(dec_tree, test_data)
#Evaluate the metrics(default = F1)
cat(“F1 : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,prediction_col = “prediction”))
cat(“\nAccuracy : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,prediction_col = “prediction”,metric_name=”accuracy”))
cat(“\nPrecision : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,prediction_col =”prediction”,metric_name=”weightedPrecision”))
cat(“\nRecall : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,prediction_col = “prediction”,metric_name=”weightedRecall”))

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit