Friday, September 5, 2025
HomeLanguagesSpark Trim String Column on DataFrame

Spark Trim String Column on DataFrame

In this article, we will see that in PySpark, we can remove white spaces in the DataFrame string column. Here we will perform a similar operation to trim() (removes left and right white spaces) present in SQL in PySpark itself.

PySpark Trim String Column on DataFrame

Below are the ways by which we can trim String Column on DataFrame in PySpark:

  • Using withColumn with rtrim()
  • Using withColumn with trim()
  • Using select()
  • Using SQL Expression
  • Using PySpark trim(), rtrim(), ltrim()

PySpark Trim using withColumn() with rtrim()

In this example, we are using withColumn() with rtrim() to remove white spaces.

Python3




from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
 
# Create a Spark session
spark = SparkSession.builder.appName("WhiteSpaceRemoval").getOrCreate()
 
# Define your data
data = [(1, "ABC    "), (2, "     DEF"), (3, "        GHI    ")]
 
# Create a DataFrame
df = spark.createDataFrame(data, ["col1", "col2"])
 
# Show the initial DataFrame
df.show()
 
# Using withColumn to remove white spaces
df = df.withColumn("col2", rtrim(col("col2")))
df.show()
 
# Stop the Spark session
spark.stop()


+----+----------+
|col1|      col2|
+----+----------+
|   1|     ABC    |
|   2|     DEF    |
|   3|        GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+

PySpark Trim using withColumn with trim()

In this example, we are using withColumn() with trim() to remove white spaces.

Python3




from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
 
# Create a Spark session
spark = SparkSession.builder.appName("WhiteSpaceRemoval").getOrCreate()
 
# Define your data
data = [(1, "ABC    "), (2, "     DEF"), (3, "        GHI    ")]
 
# Create a DataFrame
df = spark.createDataFrame(data, ["col1", "col2"])
 
# Show the initial DataFrame
df.show()
 
# Using withColumn to remove white spaces
df = df.withColumn("col2", trim(col("col2")))
df.show()
 
# Stop the Spark session
spark.stop()


+----+----------+
|col1|      col2|
+----+----------+
|   1|     ABC    |
|   2|     DEF    |
|   3|        GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+

PySpark Trim using select() to Remove White Spaces

In this example, we are using select() to trim in PySpark.

Python3




from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
 
# Create a Spark session
spark = SparkSession.builder.appName("WhiteSpaceRemoval").getOrCreate()
 
# Define your data
data = [(1, "ABC    "), (2, "     DEF"), (3, "        GHI    ")]
 
# Create a DataFrame
df = spark.createDataFrame(data, ["col1", "col2"])
 
# Show the initial DataFrame
df.show()
 
# Using select to remove white spaces
df = df.select(col("col1"), trim(col("col2")).alias("col2"))
df.show()
 
# Stop the Spark session
spark.stop()


+----+----------+
|col1|      col2|
+----+----------+
|   1|     ABC    |
|   2|     DEF    |
|   3|        GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+

Using SQL Expression

In this example, we are using SQL Expression to remove extra whitespaces.

Python3




from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
 
# Create a Spark session
spark = SparkSession.builder.appName("WhiteSpaceRemoval").getOrCreate()
 
# Define your data
data = [(1, "ABC    "), (2, "     DEF"), (3, "        GHI    ")]
 
# Create a DataFrame
df = spark.createDataFrame(data, ["col1", "col2"])
 
# Show the initial DataFrame
df.show()
 
# Using SQL Expression to remove white spaces
df.createOrReplaceTempView("TAB")
spark.sql("SELECT col1, TRIM(col2) AS col2 FROM TAB").show()
 
# Stop the Spark session
spark.stop()


+----+----------+
|col1|      col2|
+----+----------+
|   1|     ABC    |
|   2|     DEF    |
|   3|        GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+

Using PySpark trim(), rtrim(), ltrim()

In PySpark, we can easily remove whitespaces or trim by using pyspark.sql.functions.trim() function. We use ltrim() to remove only left white spaces and rtrim() function to remove right white spaces.

Python3




from pyspark.sql.functions import trim, ltrim, rtrim
data = [(1, "ABC    "), (2, "     DEF"), (3, "        GHI    ")]
df = spark.createDataFrame(data=data, schema=["col1", "col2"])
df.show()
 
 
# using withColumn and trim()
df.withColumn("col2", trim("col2")).show()
 
# using ltrim()
df.withColumn("col2", ltrim("col2")).show()
 
# using rtrim()
df.withColumn("col2", rtrim("col2")).show()
 
# Using select
df.select("col1", trim("col2").alias('col2')).show()
 
# Using SQL Expression
df.createOrReplaceTempView("TAB")
spark.sql("select col1,trim(col2) as col2 from TAB").show()


+----+----------+
|col1|      col2|
+----+----------+
|   1|     ABC    |
|   2|     DEF    |
|   3|        GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+
+----+---+
|col1|col2|
+----+---+
|   1|ABC|
|   2|DEF|
|   3|GHI|
+----+---+
Dominic
Dominichttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Dominic
32264 POSTS0 COMMENTS
Milvus
81 POSTS0 COMMENTS
Nango Kala
6634 POSTS0 COMMENTS
Nicole Veronica
11801 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11863 POSTS0 COMMENTS
Shaida Kate Naidoo
6752 POSTS0 COMMENTS
Ted Musemwa
7025 POSTS0 COMMENTS
Thapelo Manthata
6701 POSTS0 COMMENTS
Umr Jansen
6718 POSTS0 COMMENTS