Monday, November 18, 2024
Google search engine
HomeLanguagesAdd a column with the literal value in PySpark DataFrame

Add a column with the literal value in PySpark DataFrame

In this article, we are going to see how to add a column with the literal value in PySpark Dataframe.

Creating dataframe for demonstration:

Python3




# import SparkSession from the pyspark
from pyspark.sql import SparkSession
 
# build and create the
# SparkSession with name "lit_value"
spark = SparkSession.builder.appName("lit_value").getOrCreate()
 
# create the spark dataframe with columns A,B
data = spark.createDataFrame([('x',5),('Y',3),
                            ('Z',5) ],['A','B'])
 
# showing the schema and  table
data.printSchema()
data.show()


 
Output: 

Method 1: Using Lit() function

Here we can add the constant column ‘literal_values_1’ with value 1 by Using the select method. The lit() function will insert constant values to all the rows.

Select table by using select() method and pass the arguments first one is the column name, or  “*”  for selecting the whole table and second argument pass the lit() function with constant values. 

Python3




# Import the lit() function
# from the pyspark.sql.functions
from pyspark.sql.functions import lit
 
# select all the columns from data
# table and insert new columns
# 'literal_values_1' with values 1
df2 = data.select('*' ,lit("1").alias("literal_values_1"))
  
# showing the schema and updated table
df2.printSchema()
df2.show()


Output:

Method 2: Using SQL clause

In this method first, we have to create the temp view of the table with the help of createTempView we can create the temporary view. The life of this temp is up to the life of the sparkSession. CreateOrReplace will create the temp table if it is not available or if it is available then replace it.

Then after creating the table select the table by SQL clause which will take all the values as a string 

Python3




# this will create a temp view of the table as lit_val
df2.createOrReplaceTempView("temp")
 
# select all the columns and rows
# from data table and insert new
# columns 'literal_values_2' with values 2
df2 = spark.sql("select *, 2 as literal_values_2 from temp")
 
# showing the schema and updated table
df2.printSchema()
df2.show()


Output:

Method 3: Using UDF(User-defined Functions) Method

This function allows us to create the new function as per our requirements that’s why this is also called a user-defined function. Now we define the datatype of the UDF function and create the functions which will return the values in the form of a new column

Python3




# import the udf from pyspark
from pyspark.sql.functions import udf
 
# defining the data types of udf which is
# integer type
@udf("int")
 
# defining the lit_col() function which
# will return literal values to  data frame
def lit_col():
    return 3
 
# create new column as
# 'literal_values_3' with values 3
df2 = df2.withColumn('literal_values_3', lit_col())
 
# showing the schema and updated table
df2.printSchema()
df2.show()


Output:

RELATED ARTICLES

Most Popular

Recent Comments