How to Add Multiple Columns in PySpark Dataframes ?

By Umr Jansen

27 July 2024

0

2

In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes.

Let’s create a sample dataframe for demonstration:

Dataset Used: Cricket_data_set_odi

Python3

# import pandas to read json file
import pandas as pd
  
# importing module
import pyspark
  
# importing sparksession from pyspark.sql
# module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
  
# create Dataframe
df=spark.read.option(
    "header",True).csv("Cricket_data_set_odi.csv")
  
# Display Schema
df.printSchema()
  
# Show Dataframe
df.show()

Output:

Method 1: Using withColumn()

withColumn() is used to add a new or update an existing column on DataFrame

Syntax: df.withColumn(colName, col)

Returns: A new :class:`DataFrame` by adding a column or replacing the existing column that has the same name.

Code:

Python3

df.withColumn(
    'Avg_runs', df.Runs / df.Matches).withColumn(
    'wkt+10', df.Wickets+10).show()

Output:

Method 2: Using select()

You can also add multiple columns using select.

Syntax: df.select(*cols)

Code:

Python3

# Using select() to Add Multiple Column
df.select('*', (df.Runs / df.Matches).alias('Avg_runs'),
          (df.Wickets+10).alias('wkt+10')).show()

Output :

Method 3: Adding a Constant multiple Column to DataFrame Using withColumn() and select()

Let’s create a new column with constant value using lit() SQL function, on the below code. The lit() function present in Pyspark is used to add a new column in a Pyspark Dataframe by assigning a constant or literal value.

Python3

from pyspark.sql.functions import col, lit
  
  
df.select('*',lit("Cricket").alias("Sport")).
withColumn("Fitness",lit(("Good"))).show()

Output:

How to Add Multiple Columns in PySpark Dataframes ?

Python3

Method 1: Using withColumn()

Python3

Method 2: Using select()

Python3

Method 3: Adding a Constant multiple Column to DataFrame Using withColumn() and select()

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US