To save files from spark using python
df.write.save(“Filepath&name”,format=’fileformat’) – Save the RDD dataframe from spark in the given file path with given file name in given format.(format=parquet(default),json,csv)
Import necessary libraries
Initialize the Spark session
Create the required data frame
Use the predefined function to save the RDD data frame from spark
from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\
.config(“spark.some.config.option”,”some-value”)\
.getOrCreate()
#Load the file
df1=spark.read.format(‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../Emp.csv”)
df2 = df1.withColumn(“Total_salary”, (df1.SALARY+df1.TA))
#To save the result
df2.select(“NAME”,”Total_salary”).write.save(“/…./Employeedetails.json”,format=”csv”)