[vc_row][vc_column][vc_column_text]

How to implement naive bayes for classification in spark with R using SparklyR package?

[/vc_column_text][vc_empty_space][vc_column_text]

Description

To implement naive bayes for classification in spark with R using SparklyR package

[/vc_column_text][vc_empty_space][vc_column_text]

Functions used :

spark_connect(master = “local”) – To create a spark connection
sdf_copy_to(spark_connection,R object,name) – To copy data to spark environment
sdf_partition(spark_dataframe,partitions with weights,seed) – To partition spark dataframe into multiple groups
ml_naive_bayes(train_data,formula) – To build a naive bayes model
ml_predict(ml_model,test_data) – To predict the response for the test data
ml_multiclass_classification_evaluator(predict, label_col = “label”,prediction_col = “prediction”) – To evaluate the metrics(f1(default),accuracy,weighted precision,weighted recall)

[/vc_column_text][vc_empty_space][vc_tta_tabs style=”modern” shape=”square” active_section=”1″][vc_tta_section title=”Process” tab_id=”1540981759723-27683d14-e602″]

  • Load the sparklyr package
  • Create a spark connection
  • Copy data to spark environment
  • Split the data for training and testing
  • Build the naive bayes model
  • Predict using the test data
  • Evaluate the metrics
[/vc_tta_section][vc_tta_section title=”Sample Code” tab_id=”Source-code”][vc_column_text]#Load the sparklyr library
library(sparklyr)
#Create a spark connection
sc #Copy data to spark environment
data_s=sdf_copy_to(sc,read.csv(“/home/soft23/Downloads/car.data”),”car”,overwrite= TRUE)
#Split the data for training and testing
partitions=sdf_partition(data_s,training=0.8,test=0.2,seed=111)
train_data=partitions$training
test_data=partitions$test
#Build the linear regression model
nb_model=ml_naive_bayes(x =train_data,Class~.)
summary(nb_model)
#Predict using the test data
prediction = ml_predict(nb_model, test_data)
prediction
#Evaluate the metrics(default = F1)
cat(“F1 : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,
prediction_col = “prediction”))
cat(“\nAccuracy : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,
prediction_col = “prediction”,metric_name=”accuracy”))
cat(“\nPrecision : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,
prediction_col = “prediction”,metric_name=”weightedPrecision”))
cat(“\nRecall : “,ml_multiclass_classification_evaluator(prediction, label_col = “label”,
prediction_col = “prediction”,metric_name=”weightedRecall”))[/vc_column_text][vc_empty_space][/vc_tta_section][vc_tta_section title=”Screenshots” tab_id=”Screenshots”][vc_single_image image=”90557″ img_size=”full”][vc_empty_space height=”15px”][vc_single_image image=”90558″ img_size=”full”][vc_empty_space height=”15px”][vc_single_image image=”90559″ img_size=”full”][vc_empty_space height=”15px”][/vc_tta_section][/vc_tta_tabs][/vc_column][/vc_row][vc_row][vc_column][vc_empty_space][vc_row_inner][vc_column_inner width=”1/3″][vc_column_text]

Previous Sample

[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/3″][vc_column_text]

List

[/vc_column_text][/vc_column_inner][vc_column_inner width=”1/3″][vc_column_text]

Next Sample

[/vc_column_text][/vc_column_inner][/vc_row_inner][vc_empty_space][/vc_column][/vc_row]

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit