How to save files from spark using python ?


To save files from spark using python

Functions used :“Filepath&name”,format=’fileformat’) – Save the RDD dataframe from spark in the given file path with given file name in given format.(format=parquet(default),json,csv)


  Import necessary libraries

  Initialize the Spark session

  Create the required data frame

  Use the predefined function to save the RDD data frame from spark

from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\

#Load the file‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../Emp.csv”)
df2 = df1.withColumn(“Total_salary”, (df1.SALARY+df1.TA))
#To save the result“NAME”,”Total_salary”)“/…./Employeedetails.json”,format=”csv”)

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit