Friday, January 3, 2025
Google search engine
HomeLanguagesPySpark – Sort dataframe by multiple columns

PySpark – Sort dataframe by multiple columns

In this article, we are going to see how to sort the PySpark dataframe by multiple columns.

It can be done in these ways:

  • Using sort()
  • Using orderBy()

Creating Dataframe for demonstration:

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of students  data
data = [["1", "sravan", "vignan"],
        ["2", "ojaswi", "vvit"],
        ["3", "rohith", "vvit"],
        ["4", "sridevi", "vignan"],
        ["1", "sravan", "vignan"],
        ["5", "gnanesh", "iit"]]
  
# specify column names
columns = ['student ID', 'student NAME', 'college']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
print("Actual data in dataframe")
  
# show dataframe
dataframe.show()


Output:

Method 1: Using sort() function

This function is used to sort the column.

Syntax: dataframe.sort([‘column1′,’column2′,’column n’],ascending=True)

Where,

  • dataframe is the dataframe name created from the nested lists using pyspark
  • where columns are the llst of columns
  • ascending = True specifies order the dataframe in increasing order, ascending=False specifies order the dataframe in decreasing order

Example 1: Python code to sort dataframe by passing a list of multiple columns(2 columns) in ascending order.

Python3




# show dataframe by sorting the dataframe
# based on two columns in ascending order
dataframe.sort(['college','student ID'],
               ascending = True).show()


Output:

Example 2: Python program to sort the data frame by passing a list of columns in descending order

Python3




# show dataframe by sorting the dataframe
# based on two columns in descending order
dataframe.sort(['college','student NAME'],
               ascending = False).show()


Output:

Method 2: Using orderBy() function.

orderBy() function that sorts one or more columns. By default, it orders by ascending.

Syntax: orderBy(*cols, ascending=True)

Parameters:

  • cols: Columns by which sorting is needed to be performed.
  • ascending: Boolean value to say that sorting is to be done in ascending order

Example 1: Python program to show dataframe by sorting the dataframe based on two columns in descending order using orderby() function

Python3




# show dataframe by sorting the dataframe
# based on two columns in descending
# order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
                  ascending = False).show()


Output:

Example 2: Python program to show dataframe by sorting the dataframe based on two columns in ascending order using orderby() function

Python3




# show dataframe by sorting the dataframe
# based on two columns in ascending
# order using orderby() function
dataframe.orderBy(['student ID','student NAME'],
                  ascending = True).show()


Output:

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments