How to add a constant column in a PySpark DataFrame?

By Dominic Rubhabha-Wardslaus

26 July 2024

0

1

In this article, we are going to see how to add a constant column in a PySpark Dataframe.

It can be done in these ways:

Using Lit()
Using Sql query.

Creating Dataframe for demonstration:

Python3

# Create a spark session 
from pyspark.sql import SparkSession 
from pyspark.sql.functions import lit 
spark = SparkSession.builder.appName('SparkExamples').getOrCreate() 
  
# Create a spark dataframe 
columns = ["Name", "Course_Name", 
           "Months", 
           "Course_Fees", "Discount", 
           "Start_Date", "Payment_Done"] 
data = [ 
    ("Amit Pathak", "Python", 3, 
     10000, 1000, "02-07-2021", True), 
    ("Shikhar Mishra", "Soft skills", 
     2, 8000, 800, "07-10-2021", False), 
    ("Shivani Suvarna", "Accounting", 6, 
     15000, 1500, "20-08-2021", True), 
    ("Pooja Jain", "Data Science", 12, 
     60000, 900, "02-12-2021", False), 
] 
df = spark.createDataFrame(data).toDF(*columns) 
  
# View the dataframe 
df.show() 

Output:

Method 1: Using lit()

In these methods, we will use the lit() function, Here we can add the constant column ‘literal_values_1’ with value 1 by Using the select method. The lit() function will insert constant values to all the rows. We will use withColumn() select the dataframe:

Syntax: df.withColumn(“NEW_COL”, lit(VALUE))

Example 1: Adding constant value in columns.

Python3

df.withColumn('Status', lit(0)).show()

Output:

Example 2: Adding constant value based on another column.

Python3

from pyspark.sql.functions import when, lit, col 
  
df.withColumn( 
  "Great_Discount", when(col("Discount") >=1000,lit( 
    "Yes")).otherwise(lit("NO"))).show() 

Output:

Method 2: Using Sql query

Here we will use sql query inside the Pyspark, We will create a temp view of the table with the help of createTempView() and the life of this temp is up to the life of the sparkSession. registerTempTable() will create the temp table if it is not available or if it is available then replace it.

Then after creating the table select the table by SQL clause which will take all the values as a string.

Python3

df.registerTempTable('table') 
newDF = spark.sql('select *, 1 as newCol from table') 
newDF.show()

Output:

How to add a constant column in a PySpark DataFrame?

Python3

Method 1: Using lit()

Python3

Python3

Method 2: Using Sql query

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Identify Fake Websites: Complete 2025 Guide by Ana Jovanovic

How to Remove the Roblox Virus in 2025: 3 Easy Steps by Sam Boyd

What Is IDP.Alexa.51 & How Do You Remove It in 2025 by Sam Boyd

What Is Y2mate Virus & How to Protect Your System in 2025 by Sam Boyd

Recent Comments

EDITOR PICKS

How to Identify Fake Websites: Complete 2025 Guide by Ana Jovanovic

How to Remove the Roblox Virus in 2025: 3 Easy Steps by Sam Boyd

What Is IDP.Alexa.51 & How Do You Remove It in 2025 by Sam Boyd

POPULAR POSTS

How to Identify Fake Websites: Complete 2025 Guide by Ana Jovanovic

How to Remove the Roblox Virus in 2025: 3 Easy Steps by Sam Boyd

What Is IDP.Alexa.51 & How Do You Remove It in 2025 by Sam Boyd

POPULAR CATEGORY

ABOUT US

FOLLOW US