PySpark convert multiple columns to map

20 June 2025

0

In this article, we are going to convert multiple columns to map using Pyspark in Python.

An RDD transformation that is used to apply the transformation function on every element of the data frame is known as a map. While working in the Pyspark data frame, we might encounter some circumstances in which we need to convert columns of the data frame to map columns of the data frame as the map keys.

Syntax: create_map( lit(“mapkey_1”),col(“column_1”))

Parameters:

column_1: These are the column names which needs to be converted to map.

mapkey_1: These are the names of the map keys to be given to data on creation of map.

Stepwise Implementation:

Step 1: First of all, import the required libraries, i.e., SparkSession, col, lit, and create_map. The SparkSession is used to create the session, while col is used to return a column based on the given column name. The lit is used to add a new column to the DataFrame by assigning a literal or constant value, while create_map is used to convert selected DataFrame columns to MapType.

from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit,create_map

Step 2: Now, we create a spark session using getOrCreate() function.

spark_session = SparkSession.builder.getOrCreate()

Step 3: Then, either read the CSV file for the data frame or create a new data frame using createDataFrame() function.

data_frame=csv_file = spark_session.read.csv('#Path of CSV file',
                                             sep = ',', inferSchema = True,
                                             header = True)

or

data_frame=spark_session.createDataFrame(
                     [(column_1_data), (column_2_data), (column_3_data)],
                     ['column_name_1', 'column_name_2', 'column_name_3'])

Step 4: Further, create a column having map_column_name and create_map() function as arguments. The create_map() function will contain all the column names that need to be converted to map and the name to be given to mapkeys in the map.

data_frame.withColumn("map_column_name", create_map(lit("mapkey_1"),
                       col("column_name_1"), lit("mapkey_2"),
                       col("column_name_2"), lit("mapkey_3"),
                       col("column_name_3"))).drop("column_name_1",
                                                   "column_name_2",
                                                   "column_name_3")

Step 5: Finally, display the updated data frame.

data_frame.show(truncate=False)

Example 1:

In this example, we have used a data set (link), which is basically a 5×5 data frame as follows:

Then, we converted the columns ‘name,’ ‘class‘ and ‘fees‘ to map using create_map() function and stored them in the column ‘student_details‘ dropping the existing ‘name,’ ‘class‘ and ‘fees‘ columns.

Python3

# Pyspark convert multiple columns to map
 
# Import the libraries SparkSession, col, lit, create_map
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit,create_map
 
# Create a spark session using getOrCreate() function
spark_session = SparkSession.builder.getOrCreate()
 
# Read the CSV file
data_frame=csv_file = spark_session.read.csv(
  '/content/class_data.csv',
   sep = ',', inferSchema = True, header = True)
 
# Convert name, class and fees columns to map
data_frame = data_frame.withColumn(
  "student_details",create_map(lit("student_name"),
   col("name"), lit("student_class"),col("class"),
   lit("student_fees"),col("fees"))).drop("name",
                                          "class",
                                          "fees")
 
# Display the data frame
data_frame.show(truncate=False)

Output:

Example 2:

In this example, we have created a data frame with columns emp_id, name, superior_emp_id, year_joined, emp_dept_id, gender, and salary as follows:

Then, we converted the columns name, superior_emp_id, year_joined, emp_dept_id, gender, and salary to map using create_map() function and stored in the column ‘employee_details‘ dropping the existing name, superior_emp_id, year_joined, emp_dept_id, gender, and salary columns.

Python3

# Pyspark convert multiple columns to map
 
# Import the libraries SparkSession, col, lit, create_map
from pyspark.sql import SparkSession
from pyspark.sql.functions import col,lit,create_map
 
# Create a spark session using getOrCreate() function
spark_session = SparkSession.builder.getOrCreate()
 
# Define the data set
emp = [(1,"Smith",-1,"2018","10","M",3000),
       (2,"Rose",1,"2010","20","M",4000),
       (3,"Williams",1,"2010","10","M",1000),
       (4,"Jones",2,"2005","10","F",2000),
       (5,"Brown",2,"2010","40","F",4000),
       (6,"Brown",2,"2010","50","M",2000)]
 
# Define the schema of the data set
empColumns = ["emp_id","name","superior_emp_id",
              "year_joined", "emp_dept_id",
              "gender","salary"]
 
# Create the data frame through data set and schema
empDF = spark_session.createDataFrame(data=emp,
                           schema = empColumns)
 
# Convert name, superior_emp_id, year_joined, emp_dept_id, 
# gender, and salary columns to map
empDF = empDF.withColumn("employee_details",
         create_map(lit("name"),col("name"),
         lit("superior_emp_id"),col("superior_emp_id"),
         lit("year_joined"),col("year_joined"),
         lit("emp_dept_id"),col("emp_dept_id"),
         lit("gender"),col("gender"),
         lit("salary"),col("salary"))).drop("name",
              "superior_emp_id", "year_joined",
              "emp_dept_id", "gender", "salary")
 
# Display the data frame
empDF.show(truncate=False)

Output:

PySpark convert multiple columns to map

Stepwise Implementation:

Example 1:

Python3

Example 2:

Python3

Working with Titles and Heading – Python docx Module

Creating a Receipt Calculator using Python

One Liner for Python if-elif-else Statements

LEAVE A REPLY Cancel reply

Most Popular

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

Interviewed With Kyle Smith – Founder and CEO of Escalated by Shauli Zacks

EDITOR PICKS

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

POPULAR POSTS

Android 16 QPR2 Beta 3 lands with a flurry of bug fixes

Google is working on dedicated ‘Bills’ and ‘Travel’ folders for Gmail

Mint Mobile’s big bet on 5G home internet might change everything

POPULAR CATEGORY

ABOUT US

FOLLOW US