Replace values in Pandas dataframe using regex

28 July 2024

1

While working with large sets of data, it often contains text data and in many cases, those texts are not pretty at all. The text is often in very messier form and we need to clean those data before we can do anything meaningful with that text data. Mostly the text corpus is so large that we cannot manually list out all the texts that we want to replace. So in those cases, we use regular expressions to deal with such data having some pattern in it.

We have already discussed in the previous article how to replace some known string values in dataframe. In this post, we will use regular expressions to replace strings that have some pattern to it.

Using `Dataframe.replace() Function`

Problem #1: You are given a dataframe that contains the details about various events in different cities. For those cities which start with the keyword ‘New’ or ‘new’, change it to ‘New_’.

Solution: We are going to use regular expression to detect such names and then we will use Dataframe.replace() function to replace those names.

Python3

# importing pandas as pd
import pandas as pd
 
# Let's create a Dataframe
df = pd.DataFrame({'City':['New York', 'Parague', 'New Delhi', 'Venice', 'new Orleans'],
                    'Event':['Music', 'Poetry', 'Theatre', 'Comedy', 'Tech_Summit'],
                    'Cost':[10000, 5000, 15000, 2000, 12000]})
 
# Let's create the index
index_ = [pd.Period('02-2018'), pd.Period('04-2018'),
          pd.Period('06-2018'), pd.Period('10-2018'), pd.Period('12-2018')]
 
# Set the index
df.index = index_
 
# Let's print the dataframe
print(df)

Output :

                City        Event   Cost
2018-02     New York        Music  10000
2018-04      Parague       Poetry   5000
2018-06    New Delhi      Theatre  15000
2018-10       Venice       Comedy   2000
2018-12  new Orleans  Tech_Summit  12000

Now we will write the regular expression to match the string and then we will use Dataframe.replace() function to replace those names.

Python3

# replace the matching strings
df_updated = df.replace(to_replace ='[nN]ew', value = 'New_', regex = True)
 
# Print the updated dataframe
print(df_updated)

Output :

                 City        Event   Cost
2018-02     New_ York        Music  10000
2018-04       Parague       Poetry   5000
2018-06    New_ Delhi      Theatre  15000
2018-10        Venice       Comedy   2000
2018-12  New_ Orleans  Tech_Summit  12000

As we can see in the output, the old strings have been replaced with the new ones successfully.

Problem #2: You are given a dataframe containing details about various events in different cities. The names of certain cities contain some additional details enclosed in a bracket. Search for such names and remove the additional details.

Solutioncontaining: For this task, we will write our own customized function using regular expression to identify and update the names of those cities. Additionally, We will use Dataframe.apply() function to apply our customized function on each values the column.

Python3

# importing pandas as pd
import pandas as pd
 
# Let's create a Dataframe
df = pd.DataFrame({'City':['New York (City)', 'Parague', 'New Delhi (Delhi)', 'Venice', 'new Orleans'],
                    'Event':['Music', 'Poetry', 'Theatre', 'Comedy', 'Tech_Summit'],
                    'Cost':[10000, 5000, 15000, 2000, 12000]})
 
 
# Let's create the index
index_ = [pd.Period('02-2018'), pd.Period('04-2018'),
          pd.Period('06-2018'), pd.Period('10-2018'), pd.Period('12-2018')]
 
# Set the index
df.index = index_
 
# Let's print the dataframe
print(df)

Output :

                      City        Event   Cost
2018-02    New York (City)        Music  10000
2018-04            Parague       Poetry   5000
2018-06  New Delhi (Delhi)      Theatre  15000
2018-10             Venice       Comedy   2000
2018-12        new Orleans  Tech_Summit  12000

Now we will write our own customized function to match the description in the names of the cities.

Python3

# Importing re package for using regular expressions
import re
 
# Function to clean the names
def Clean_names(City_name):
    # Search for opening bracket in the name followed by
    # any characters repeated any number of times
    if re.search('\(.*', City_name):
 
        # Extract the position of beginning of pattern
        pos = re.search('\(.*', City_name).start()
 
        # return the cleaned name
        return City_name[:pos]
 
    else:
        # if clean up needed return the same name
        return City_name
         
# Updated the city columns
df['City'] = df['City'].apply(Clean_names)
 
# Print the updated dataframe
print(df)

Output :

                City        Event   Cost
2018-02    New York         Music  10000
2018-04      Parague       Poetry   5000
2018-06   New Delhi       Theatre  15000
2018-10       Venice       Comedy   2000
2018-12  new Orleans  Tech_Summit  12000

Replace values in Pandas dataframe using regex

Using `Dataframe.replace() Function`

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Surfshark Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

7 Best Offline Password Managers in 2024: Just Updated by Manual Thomas

7 Best Parental Controls for WhatsApp in 2024 by Penka Hristovska

NordVPN Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

Recent Comments

EDITOR PICKS

Surfshark Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

7 Best Offline Password Managers in 2024: Just Updated by Manual Thomas

7 Best Parental Controls for WhatsApp in 2024 by Penka Hristovska

POPULAR POSTS

Surfshark Black Friday & Cyber Monday Deals in 2024 by Gjurgjica Panova

7 Best Offline Password Managers in 2024: Just Updated by Manual Thomas

7 Best Parental Controls for WhatsApp in 2024 by Penka Hristovska

POPULAR CATEGORY

ABOUT US

FOLLOW US

Replace values in Pandas dataframe using regex

Using Dataframe.replace() Function

Python3

Python3

Python3

Python3

LEAVE A REPLY Cancel reply

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US

Using `Dataframe.replace() Function`