How to save files from spark using python

Objective

To save files from spark using python

Functions used :

df.write.save(“Filepath&name”,format=’fileformat’) – Save the RDD dataframe from spark in the given file path with given file name in given format.(format=parquet(default),json,csv)

Process

Import necessary libraries

Initialize the Spark session

Create the required data frame

Use the predefined function to save the RDD data frame from spark

Sample Code

from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\
.config(“spark.some.config.option”,”some-value”)\
.getOrCreate()

#Load the file
df1=spark.read.format(‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../Emp.csv”)
df2 = df1.withColumn(“Total_salary”, (df1.SALARY+df1.TA))
#To save the result
df2.select(“NAME”,”Total_salary”).write.save(“/…./Employeedetails.json”,format=”csv”)

Screenshots

List

Office Address

Social List