How to save files from spark using python ?

Objective

To save files from spark using python

Functions used :

df.write.save(“Filepath&name”,format=’fileformat’) – Save the RDD dataframe from spark in the given file path with given file name in given format.(format=parquet(default),json,csv)

Process:

  Import necessary libraries

  Initialize the Spark session

  Create the required data frame

  Use the predefined function to save the RDD data frame from spark

from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\
.config(“spark.some.config.option”,”some-value”)\
.getOrCreate()

#Load the file
df1=spark.read.format(‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../Emp.csv”)
df2 = df1.withColumn(“Total_salary”, (df1.SALARY+df1.TA))
#To save the result
df2.select(“NAME”,”Total_salary”).write.save(“/…./Employeedetails.json”,format=”csv”)

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit