Research breakthrough possible @S-Logix

Office Address

Social List

How to select,filter and sort a data frame in spark using python


To select,filter and sort a data frame in spark using python

Functions used :

df.groupBy(“Col_name”) – To group the dataframe by a given column
df.filter(Condition) – To filter the data frame based on the given condition
df.sort(Col_name,ascending=True or False) – To sort the dataframe either in ascending or in descending order.(ascending = True for ascending order and False for descending order)
df.orderBy([Col_names],ascending=[values]) – To sort the data frame by more than one column value whose priority depends on the values in the ascending parameter – To select a column


  Import necessary libraries

  Initialize the Spark session

  Create the required data frame

  Use the predefined functions to select,filter and sort the data frame

Sample Code

from pyspark.sql import SparkSession
#Set up SparkContext and SparkSession
spark=SparkSession \
.builder \
.appName(“Python spark example”)\

#Load the file‘com.databricks.spark.csv’).options(header=’True’,inferschema=’True’).load(“/home/…../weight-height.csv”)
#To group the df based on some criteria
#To filter the df based on some criteria
#To count the number of samples in the filtered dataset
#To sort the data set in descending order based on a particular column values
#To sort the data set in ascending order based on a particular column values
#To sort the data set in ascending order based on more than one column values

select,filter and sort a data frame in spark using python
Import necessary libraries
Initialize the spark session
To group the dataframe by a given column
To filter the df based on some criteria
Create the required data frame
Result plot