Add a column with the literal value in PySpark DataFrame

26 July 2024

2

In this article, we are going to see how to add a column with the literal value in PySpark Dataframe.

Creating dataframe for demonstration:

Python3

# import SparkSession from the pyspark
from pyspark.sql import SparkSession
 
# build and create the
# SparkSession with name "lit_value"
spark = SparkSession.builder.appName("lit_value").getOrCreate()
 
# create the spark dataframe with columns A,B
data = spark.createDataFrame([('x',5),('Y',3),
                            ('Z',5) ],['A','B']) 
 
# showing the schema and  table
data.printSchema()
data.show()

Output:

Method 1: Using Lit() function

Here we can add the constant column ‘literal_values_1’ with value 1 by Using the select method. The lit() function will insert constant values to all the rows.

Select table by using select() method and pass the arguments first one is the column name, or “*” for selecting the whole table and second argument pass the lit() function with constant values.

Python3

# Import the lit() function
# from the pyspark.sql.functions
from pyspark.sql.functions import lit 
 
# select all the columns from data
# table and insert new columns
# 'literal_values_1' with values 1
df2 = data.select('*' ,lit("1").alias("literal_values_1"))
  
# showing the schema and updated table
df2.printSchema()
df2.show()

Output:

Method 2: Using SQL clause

In this method first, we have to create the temp view of the table with the help of createTempView we can create the temporary view. The life of this temp is up to the life of the sparkSession. CreateOrReplace will create the temp table if it is not available or if it is available then replace it.

Then after creating the table select the table by SQL clause which will take all the values as a string

Python3

# this will create a temp view of the table as lit_val
df2.createOrReplaceTempView("temp")
 
# select all the columns and rows
# from data table and insert new
# columns 'literal_values_2' with values 2
df2 = spark.sql("select *, 2 as literal_values_2 from temp")
 
# showing the schema and updated table
df2.printSchema()
df2.show()

Output:

Method 3: Using UDF(User-defined Functions) Method

This function allows us to create the new function as per our requirements that’s why this is also called a user-defined function. Now we define the datatype of the UDF function and create the functions which will return the values in the form of a new column

Python3

# import the udf from pyspark
from pyspark.sql.functions import udf
 
# defining the data types of udf which is
# integer type
@udf("int")
 
# defining the lit_col() function which
# will return literal values to  data frame 
def lit_col():
    return 3
 
# create new column as
# 'literal_values_3' with values 3
df2 = df2.withColumn('literal_values_3', lit_col())
 
# showing the schema and updated table
df2.printSchema()
df2.show()

Output:

Add a column with the literal value in PySpark DataFrame

Python3

Method 1: Using Lit() function

Python3

Method 2: Using SQL clause

Python3

Method 3: Using UDF(User-defined Functions) Method

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

Google wants to hear your thoughts on the Android 15 QPR2 Beta

Recent Comments

EDITOR PICKS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR POSTS

How to Secure Your Network-Attached Storage (NAS) in 2024 by Tyler Cross

8 Best Private Search Engines in 2024: Tested by Experts by Tyler Cross

The biggest comeback in tech history [Video]

POPULAR CATEGORY

ABOUT US

FOLLOW US