How to initialize spark session and create data frame in Spark using Python.?


To Initialize spark session and create data frame in Spark using Python

Functions used :

    spark=SparkSession \

.builder \

.appName(“Python spark example”)\


  spark.sparkContext.parallelize([(Row1 values),(Row2 values),…..,(Row n values)]).toDF([‘col1′, ‘col2′, ….,’coln’])


(Row 1 values),(Row 2 values),…..,(Row n values)],[‘col1′, ‘col2′, ….,’coln’])


  Import necessary libraries

  Initialize the Spark session

  Create the required data frame

from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\
#Creating RDD using parallelize function
df = spark.sparkContext.parallelize([(2, 4, 6, ‘Row1’),(3, 6, 9, ‘Row2’),
(4, 8, 12, ‘Row3’)]).toDF([‘col1’, ‘col2’, ‘col3′,’col4′])
#Creating RDD uding createDataFrame function
Emp_det = spark.createDataFrame([
(3,’S.Priyanka’,16000,700,28,0,’DATA ENTRY’)],
#Using read and load function‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../weight-height.csv”)

Leave Comment

Your email address will not be published. Required fields are marked *

clear formSubmit