#5, First Floor, 4th Street , Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

How to save files from spark using python
Objective

To save files from spark using python

Functions used :

df.write.save(“Filepath&name”,format=’fileformat’) – Save the RDD dataframe from spark in the given file path with given file name in given format.(format=parquet(default),json,csv)

Process

  Import necessary libraries

  Initialize the Spark session

  Create the required data frame

  Use the predefined function to save the RDD data frame from spark

Sapmle Code

from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\
.config(“spark.some.config.option”,”some-value”)\
.getOrCreate()

#Load the file
df1=spark.read.format(‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../Emp.csv”)
df2 = df1.withColumn(“Total_salary”, (df1.SALARY+df1.TA))
#To save the result
df2.select(“NAME”,”Total_salary”).write.save(“/…./Employeedetails.json”,format=”csv”)

Screenshots