How to add,remove and update column of a data frame in spark using R ?

Description

To add,remove and update column of a data frame in spark using R

Functions used :

withColumn(df,”New_col_name”, (Col_value)) – To add columns to a dataframe
withColumnRenamed(df,’Existing column name’, ‘Rename’) – To rename column name of a data frame
drop(df,c(columns_to_be_dropped)) – To drop columns of a data frame

  • Set up spark home
  • Load the spark library
  • Initialize the spark context
  • Load the data set
  • Use the predefined functions to add,remove and update column of the data frame

#Set up spark home
Sys.setenv(SPARK_HOME=”…../spark-2.4.0-bin-hadoop2.7″)
.libPaths(c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”), .libPaths()))
#Load the library
library(SparkR)
#Initialize the Spark Context
#To run spark in a local node give master=”local”
sc #Start the SparkSQL Context
sqlContext #Load the data set
data = read.df(“file:///…./Emp.csv”,”csv”,header = “true”, inferSchema = “true”, na.strings = “NA”)
#To add a new column
data1 = withColumn(data,”Final_salary”, (data$SALARY+data$TA))
showDF(data,5)
showDF(data1,5)
#To rename a column name
data2 =withColumnRenamed(data1,”SALARY”,”BASIC_PAY”)
showDF(data2,5)
#To drop columns of a dataframe
data3=drop(data2,c(“Bal_Amnt”,”DEPT”))
showDF(data3,5)

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit