PySpark – Order by multiple columns

26 July 2024

6

In this article, we are going to see how to orderby multiple columns in PySpark DataFrames through Python.

Create the dataframe for demonstration:

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data 
data = [["1", "sravan", "company 1"], 
        ["2", "ojaswi", "company 1"], 
        ["3", "rohith", "company 2"], 
        ["4", "sridevi", "company 1"], 
        ["5", "bobby", "company 1"]] 
  
# specify column names 
columns = ['ID', 'NAME', 'Company'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data, columns) 
  
dataframe.show() 

Output:

orderby means we are going to sort the dataframe by multiple columns in ascending or descending order. we can do this by using the following methods.

Method 1 : Using orderBy()

This function will return the dataframe after ordering the multiple columns. It will sort first based on the column name given.

Syntax:

Ascending order: dataframe.orderBy([‘column1′,’column2′,……,’column n’], ascending=True).show()

Descending Order: dataframe.orderBy([‘column1′,’column2′,……,’column n’], ascending=False).show()

where:

dataframe is the Pyspark Input dataframe

ascending=True specifies to sort the dataframe in ascending order

ascending=False specifies to sort the dataframe in descending order

Example 1: Sort the PySpark dataframe in ascending order with orderBy().

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data 
data = [["1", "sravan", "company 1"], 
        ["2", "ojaswi", "company 1"], 
        ["3", "rohith", "company 2"], 
        ["4", "sridevi", "company 1"], 
        ["5", "bobby", "company 1"]] 
  
# specify column names 
columns = ['ID', 'NAME', 'Company'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data, columns) 
  
# orderBy dataframe in asc order 
dataframe.orderBy(['Name', 'ID', 'Company'], 
                  ascending=True).show()

Output:

Example 2: Sort the PySpark dataframe in descending order with orderBy().

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data 
data = [["1", "sravan", "company 1"], 
        ["2", "ojaswi", "company 1"], 
        ["3", "rohith", "company 2"], 
        ["4", "sridevi", "company 1"], 
        ["5", "bobby", "company 1"]] 
  
# specify column names 
columns = ['ID', 'NAME', 'Company'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data, columns) 
  
# orderBy dataframe in desc order 
dataframe.orderBy(['Name', 'ID', 'Company'],  
                  ascending=False).show() 

Output:

Method 2: Using sort()

This function will return the dataframe after ordering the multiple columns. It will sort first based on the column name given.

Syntax:

Ascending order: dataframe.sort([‘column1′,’column2′,……,’column n’], ascending=True).show()

Descending Order: dataframe.sort([‘column1′,’column2′,……,’column n’], ascending=False).show()

where,

dataframe is the Pyspark Input dataframe

ascending=True specifies to sort the dataframe in ascending order

ascending=False specifies to sort the dataframe in descending order

Example 1: Sort PySpark dataframe in ascending order

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data 
data = [["1", "sravan", "company 1"], 
        ["2", "ojaswi", "company 1"], 
        ["3", "rohith", "company 2"], 
        ["4", "sridevi", "company 1"], 
        ["5", "bobby", "company 1"]] 
  
# specify column names 
columns = ['ID', 'NAME', 'Company'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data, columns) 
  
# orderBy dataframe in asc order 
dataframe.sort(['Name', 'ID', 'Company'], 
               ascending=True).show() 

Output:

Example 2: Sort the PySpark dataframe in descending order

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of employee data 
data = [["1", "sravan", "company 1"], 
        ["2", "ojaswi", "company 1"], 
        ["3", "rohith", "company 2"], 
        ["4", "sridevi", "company 1"], 
        ["5", "bobby", "company 1"]] 
  
# specify column names 
columns = ['ID', 'NAME', 'Company'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data, columns) 
  
# orderBy dataframe in desc order 
dataframe.sort(['Name', 'ID', 'Company'], 
               ascending=False).show() 

Output:

PySpark – Order by multiple columns

Create the dataframe for demonstration:

Python3

Method 1 : Using orderBy()

Python3

Python3

Method 2: Using sort()

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US