Friday, December 27, 2024
Google search engine
HomeLanguagesSort the PySpark DataFrame columns by Ascending or Descending order

Sort the PySpark DataFrame columns by Ascending or Descending order

In this article, we are going to sort the dataframe columns in the pyspark. For this, we are using sort() and orderBy() functions in ascending order and descending order sorting.

Let’s create a sample dataframe.

Python3




# importing module
import pyspark
  
# importing sparksession from 
# pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["1", "sravan", "company 1"],
        ["4", "sridevi", "company 1"]]
  
# specify column names
columns = ['Employee_ID', 'Employee NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# display data in the dataframe
dataframe.show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          1|       sravan|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

Using sort() function

The sort function is used to sort the data frame column.

Syntax: dataframe.sort([‘column name’], ascending=True).show()

Example 1: Arrange in ascending Using Sort() with one column

Sort the data based on Employee Name in increasing order

Python3




# sort the dataframe based on 
# employee name column in ascending order
dataframe.sort(['Employee NAME'],
               ascending = True).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

Sort the data based on Employee name in decreasing order:

Syntax: dataframe.sort([‘column name’], ascending = False).show()

Code:

Python3




# sort the dataframe based on 
# employee name column in descending order
dataframe.sort(['Employee NAME'],
               ascending = False).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
+-----------+-------------+---------+

Example 2: Using Sort() with multiple columns

We are going to sort the dataframe based on employee id and employee name in ascending order.

Python3




# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.sort(['Employee_ID','Employee NAME'],
               ascending = True).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

We are going to sort the dataframe based on employee ID, company, and employee name in descending order

Python3




# sort the dataframe based on employee ID ,
# company and employee Name columns in descending order
dataframe.sort(['Employee_ID','Employee NAME',
                'Company'], ascending = False).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

Example 3: Sort by ASC methods.

ASC method of the Column function, it returns a sort expression based on the ascending order of the given column name.

Python3




dataframe.sort(dataframe.Employee_ID.asc()).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

Example 4: Sort by DESC methods.

DESC method of the Column function, it returns a sort expression based on the descending order of the given column name.

Python3




dataframe.sort(dataframe.Employee_ID.desc()).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

Using OrderBy() Function

The orderBy() function sorts by one or more columns. By default, it sorts by ascending order.

Syntax: orderBy(*cols, ascending=True)

Parameters:

  • cols→ Columns by which sorting is needed to be performed.
  • ascending→ Boolean value to say that sorting is to be done in ascending order

Example 1: ascending for one column

Python program to sort the dataframe based on Employee ID in ascending order

Python3




# sort the dataframe based on employee I
# columns in descending order
dataframe.orderBy(['Employee_ID'],
                  ascending=False).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

Python program to sort the dataframe based on  Employee ID in descending order

Python3




# sort the dataframe based on
# Employee ID in descending order
dataframe.orderBy(['Employee_ID'],
                  ascending = False).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

Example 2: Ascending multiple columns

Sort the dataframe based on employee ID and employee Name columns in descending order using orderBy.

Python3




# sort the dataframe based on employee ID 
# and employee Name columns in descending order
dataframe.orderBy(['Employee ID','Employee NAME'],
                  ascending = False).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
|          3|       rohith|company 2|
|          2|       ojaswi|company 1|
|          1|       sravan|company 1|
|          1|       sravan|company 1|
+-----------+-------------+---------+

Sort the dataframe based on employee ID and employee Name columns in ascending order

Python3




# sort the dataframe based on employee ID 
# and employee Name columns in ascending order
dataframe.orderBy(['Employee_ID','Employee NAME'],
                  ascending =True).show()


Output:

+-----------+-------------+---------+
|Employee_ID|Employee NAME|  Company|
+-----------+-------------+---------+
|          1|       sravan|company 1|
|          1|       sravan|company 1|
|          2|       ojaswi|company 1|
|          3|       rohith|company 2|
|          4|      sridevi|company 1|
|          4|      sridevi|company 1|
+-----------+-------------+---------+

RELATED ARTICLES

Most Popular

Recent Comments