HomeLanguagesRename Nested Field in Spark Dataframe in Python

Languages Python

Rename Nested Field in Spark Dataframe in Python

22 July 2024

3

In this article, we will discuss different methods to rename the columns in the DataFrame like withColumnRenamed or select. In Apache Spark, you can rename a nested field (or column) in a DataFrame using the withColumnRenamed method. This method allows you to specify the new name of a column and returns a new DataFrame with the renamed column.

Required Package

PySpark is the Python library for Spark programming. It allows developers to interact with the Spark cluster using the Python programming language. PySpark is a powerful tool for large-scale data processing and analysis, as it allows you to perform distributed computations on large datasets using the power of the Spark engine. you can install Pyspark using the following command:

!pip install pyspark

Rename Field in spark Dataframe

You can use the withColumnRenamed method to rename a field in a Spark DataFrame. For example, if you have a DataFrame called df and you want to rename the field “oldFieldName” to “newFieldName”, you can use the following code structure:

df.withColumnRenamed("oldFieldName", "newFieldName")

Create the spark DataFrame.

Python3

from pyspark.sql import SparkSession 
# Create a SparkSession 
spark = SparkSession.builder.appName 
                ("CreateDF").getOrCreate() 
data = [(1, "John", "a", 25), (2, "Mike",  
               "b", 30), (3, "Sara", "c", 35)] 
  
# Create a DataFrame 
df = spark.createDataFrame(data, 
              ["id", "fname", "lname", "age"]) 
df.printSchema() 

Output:

root
 |-- id: long (nullable = true)
 |-- fname: string (nullable = true)
 |-- lname: string (nullable = true)
 |-- age: long (nullable = true)

Change the name of the single column by providing the oldfieldName and the NewFieldName.

Python3

df1 = df.withColumnRenamed("fname","FirstName") 
df1.printSchema()

Output:

root
 |-- id: long (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- lname: string (nullable = true)
 |-- age: long (nullable = true)

Rename multiple columns then we will write the chain of the withColumnRenamed function

Python3

df2 = (df.withColumnRenamed("fname","FirstName") 
       .withColumnRenamed("lname","LastName")       
      ) 
df2.printSchema()

Output:

root
 |-- id: long (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- age: long (nullable = true)

Rename nested field in spark DataFrame

If we have nested columns then we have to redefine the structure of the DataFrame. First, we will define the schema then we will apply the schema using the following code structure:

df.select(col("address").cast(struct_schema)).printSchema()

Create the DataFrame.

Python3

from pyspark.sql.types import StructType, StructField, StringType, IntegerType 
  
# Define the schema for the DataFrame 
schema = StructType([ 
    StructField("name", StringType()), 
    StructField("age", IntegerType()), 
    StructField("address", StructType([ 
        StructField("street", StringType()), 
        StructField("city", StringType()), 
        StructField("zip", IntegerType()) 
    ])) 
]) 
  
# Create the DataFrame 
data = [("Alice", 25, {"street": "Main St", "city": "Anytown", "zip": 12345}),   
        ("Bob", 30, {"street": "Park Ave", "city": "New York", "zip": 56789})] 
df = spark.createDataFrame(data, schema) 
  
# Show the DataFrame 
df.show() 
#print the Schema 
df.printSchema() 

Output:

+-----+---+---------------------------+
|name |age|address                    |
+-----+---+---------------------------+
|Alice|25 |{Main St, Anytown, 12345}  |
|Bob  |30 |{Park Ave, New York, 56789}|
+-----+---+---------------------------+

root
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)
 |-- address: struct (nullable = true)
 |    |-- street: string (nullable = true)
 |    |-- city: string (nullable = true)
 |    |-- zip: integer (nullable = true)

To rename the filed name we have to redefine the structure of the DataFrame while defining the schema we have to pass the newfieldname and its datatype.

Python3

#import the libraries 
from pyspark.sql.types import  LongType, StringType, StructField, StructType 
from pyspark.sql.functions import col 
  
#define the schema 
struct_schema = StructType([ 
    StructField("Street_name", StringType()), 
    StructField("city_name", StringType()), 
    StructField("Zip_code", IntegerType()) 
]) 
#apply the schema 
df.select(col("address").cast(struct_schema)).printSchema() 

Output:

 root
 |-- address: struct (nullable = true)
 |    |-- Street_name: string (nullable = true)
 |    |-- city_name: string (nullable = true)
 |    |-- Zip_code: integer (nullable = true)

Previous article

Surfshark vs. Private Internet Access 2024 — Which Is Better? by Kristel van Hoof

Next article

llist module in Python

Dominic http://wardslaus.com

infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

강서구출장마사지 on How to store XML data into a MySQL database using Python?

금천구출장마사지 on How to store XML data into a MySQL database using Python?

nightwish.southeast.cz on Google says it won’t keep your Pixel during a repair if you’re caught using non-OEM parts

광명출장안마 on How to store XML data into a MySQL database using Python?

광명출장안마 on How to store XML data into a MySQL database using Python?

출장오피 on How to store XML data into a MySQL database using Python?

부천출장안마 on How to store XML data into a MySQL database using Python?

구월동출장안마 on How to store XML data into a MySQL database using Python?

강서구출장안마 on How to store XML data into a MySQL database using Python?

헬로출장 on How to store XML data into a MySQL database using Python?

오산출장안마 on How to store XML data into a MySQL database using Python?

광명출장마사지 on How to store XML data into a MySQL database using Python?

마포출장 on How to store XML data into a MySQL database using Python?

안양출장마사지 on How to store XML data into a MySQL database using Python?

gKTdhA on 5 reasons why I won’t switch away from Google Photos

부천출장안마 on How to store XML data into a MySQL database using Python?

동탄출장안마 on How to store XML data into a MySQL database using Python?

0a1Mq7 on Wander: An add-on for Apple’s Shortcuts app to install the Odyssey jailbreak

서울출장안마 on How to store XML data into a MySQL database using Python?

분당출장안마 on How to store XML data into a MySQL database using Python?

부천출장안마 on How to store XML data into a MySQL database using Python?

출장 오피 on How to store XML data into a MySQL database using Python?

화곡동출장마사지 on How to store XML data into a MySQL database using Python?

Gilda on Wander: An add-on for Apple’s Shortcuts app to install the Odyssey jailbreak

강서구출장마사지 on How to store XML data into a MySQL database using Python?

고양출장안마 on How to store XML data into a MySQL database using Python?

화성출장마사지 on How to store XML data into a MySQL database using Python?

천호동출장마사지 on How to store XML data into a MySQL database using Python?

June P. D. Alvarez on RedSn0w Updated to Fix iBooks DRM Issues

Litha on How to Install Siri on iPad 2