How to implement decision tree for regression in spark with R using SparklyR package?

Description

To implement decision tree for regression in spark with R using SparklyR package

Functions used :

spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
sdf_partition(spark_dataframe,partitions with weights,seed) – To partition spark dataframe into multiple groups
ml_decision_tree_regressor(train_data,formula) – To build a decision tree model
ml_predict(ml_model,test_data) – To predict the response for the test data
ml_regression_evaluator(predict,label_col,prediction_col,metric_name) – To evaluate the metrics(RMSE(default),MSE,R2,MAE)

  • Load the sparklyr package
  • Create a spark connection
  • Copy data to spark environment
  • Split the data for training and testing
  • Build the decision tree model
  • Predict using the test data
  • Evaluate the metrics

#Load the sparklyr library
library(sparklyr)
#Create a spark connection
sc #Copy data to spark environment
data_s=sdf_copy_to(sc,read.csv(“…../servo.csv”),”servo”,overwrite= TRUE)
#Split the data for training and testing
partitions=sdf_partition(data_s,training=0.8,test=0.2,seed=111)
train_data=partitions$training
test_data=partitions$test
#Build the linear regression model
dec_tree=ml_decision_tree_regressor(x =train_data,Class~.)
summary(dec_tree)
#Predict using the test data
prediction = ml_predict(dec_tree, test_data)
prediction
#Evaluate the metrics
#Default RMSE(Root Mean Square Error)
cat(“Root Mean Squared Error : “,ml_regression_evaluator(prediction, label_col = “Class”,prediction_col = “prediction”))
cat(“\nMean Squared Error : “,ml_regression_evaluator(prediction, label_col = “Class”,prediction_col = “prediction”,metric_name = “mse”))
cat(“\nR-Squared : “,ml_regression_evaluator(prediction, label_col = “Class”,prediction_col = “prediction”,metric_name = “r2”))
cat(“\nMean Absolute Error : “,ml_regression_evaluator(prediction, label_col = “Class”,prediction_col = “prediction”,metric_name = “mae”))

#Array computation and accessing elements of matrices
v1=c(1:3)
v2=c(1:9)
print(v2)
a=array(c(v1,v2),dim=c(3,3,3))

#dimnames=list(rownames,columnnames))
m1=a[,,3]
m2=a[,,2]
print(m1)
print(m2)
print(m1+m2)
print(a)
cat(“output of apply function\n”)
print(apply(a,c(1,2),sum))

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit