Python | Pandas Series.str.replace() to replace text in a series

28 July 2024

1

Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Pandas Series.str.replace() method works like Python .replace() method only, but it works on Series too. Before calling .replace() on a Pandas series, .str has to be prefixed in order to differentiate it from Python’s default replace method.

Syntax:
Series.str.replace(pat, repl, n=-1, case=None, regex=True)

Parameters:
pat: string or compiled regex to be replaced
repl: string or callable to replace instead of pat
n: Number of replacements to make in a single string, default is -1 which means all.
case: Takes boolean value to decide case sensitivity. Make false for case insensitivity
regex: Boolean value, if True assume that the passed pattern is a regex

Return Type:
Series with replaced text values

Example: The .str.replace() method is a part of the Pandas String Handling capabilities. This let users to replace occurrences of a specified substring with another substring in text data contained within a Pandas Series. This feature is particularly useful when performing data cleaning, transformation, and preparation tasks, as it simplifies the process of altering text content in large datasets.

Python3

import pandas as pd
 
data = {'text': ['Blue', 'Green', 'Red']}
df = pd.DataFrame(data)
 
df['text'] = df['text'].str.replace(' ', '_')
print(df)

Output:

          text 
0       Blue 
1     Green 
2        Red

Now we will see the example of using str.replace() on dataset. In the following examples, the data frame used contains data of some NBA players. To download the CSV used in code, click here. Let’s load the dataset and see how it looks.

Python3

# importing pandas module
import pandas as pd
 
# reading csv file from url
data = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")
 
#printing first 5 rows
print(data.head())

Output:

            Name            Team  Number Position   Age Height  Weight            College     Salary
0  Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0              Texas  7730337.0  
1    Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0          Marquette  6796117.0 
2   John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0  Boston University        NaN   
3    R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0      Georgia State  1148640.0 
4  Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0                NaN  5000000.0

Example 1: Replacing values in age column In this example, all the values in age column having value 25.0 are replaced with “Twenty five” using str.replace() After that, a filter is created and passed in .where() method to only display the rows which have Age = “Twenty five”.

Python3

# overwriting column with replaced value of age
data["Age"]= data["Age"].replace(25.0, "Twenty five")
 
# creating a filter for age column
# where age = "Twenty five"
filter = data["Age"]=="Twenty five"
 
# printing only filtered columns
data.where(filter).dropna()

Output: As shown in the output, all the values in Age column having age=25.0 have been replaced by “Twenty five”.

                    Name                    Team  Number Position         Age Height  Weight               College      Salary 
0          Avery Bradley          Boston Celtics     0.0       PG Twenty five    6-2   180.0                 Texas   7730337.0   
1            Jae Crowder          Boston Celtics    99.0       SF Twenty five    6-6   235.0             Marquette   6796117.0   
7           Kelly Olynyk          Boston Celtics    41.0        C Twenty five    7-0   238.0               Gonzaga   2165160.0  
26       Thomas Robinson           Brooklyn Nets    41.0       PF Twenty five   6-10   237.0                Kansas    981348.0   
35      Cleanthony Early         New York Knicks    11.0       SF Twenty five    6-8   210.0         Wichita State    845059.0   
44      Derrick Williams         New York Knicks    23.0       PF Twenty five    6-8   240.0               Arizona   4000000.0  
47         Isaiah Canaan      Philadelphia 76ers     0.0       PG Twenty five    6-0   201.0          Murray State    947276.0    
48      Robert Covington      Philadelphia 76ers    33.0       SF Twenty five    6-9   215.0       Tennessee State   1000000.0

Example 2: Case Insensitivity In this example, team name Boston Celtics is replaced by New Boston Celtics. In the parameters, instead of passing Boston, boston is passed (with ‘b’ in lower case) and the case is set to False, which means case insensitive. After that only teams having team name “New Boston Celtics” are displayed using .where() method.

Python3

# importing pandas module
import pandas as pd
 
# reading csv file from url
data = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")
 
# overwriting column with replaced value of age
data["Team"]= data["Team"].str.replace("boston", "New Boston", case = False)
 
# creating a filter for age column
# where age = "Twenty five"
filter = data["Team"]=="New Boston Celtics"
 
# printing only filtered columns
data.where(filter).dropna()

Output: As shown in the output, Boston is replaced by New Boston irrespective of the lower case passed in the parameters. This is because the case parameter was set to False.

               Name                Team  Number Position   Age Height  Weight           College     Salary
0     Avery Bradley  New Boston Celtics     0.0       PG  25.0    6-2   180.0             Texas  7730337.0  
1       Jae Crowder  New Boston Celtics    99.0       SF  25.0    6-6   235.0         Marquette  6796117.0 
3       R.J. Hunter  New Boston Celtics    28.0       SG  22.0    6-5   185.0     Georgia State  1148640.0
6     Jordan Mickey  New Boston Celtics    55.0       PF  21.0    6-8   235.0               LSU  1170960.0  
7      Kelly Olynyk  New Boston Celtics    41.0        C  25.0    7-0   238.0           Gonzaga  2165160.0  
8      Terry Rozier  New Boston Celtics    12.0       PG  22.0    6-2   190.0        Louisville  1824360.0 
9      Marcus Smart  New Boston Celtics    36.0       PG  22.0    6-4   220.0    Oklahoma State  3431040.0  
10  Jared Sullinger  New Boston Celtics     7.0        C  24.0    6-9   260.0        Ohio State  2569260.0 
11    Isaiah Thomas  New Boston Celtics     4.0       PG  27.0    5-9   185.0        Washington  6912869.0 
12      Evan Turner  New Boston Celtics    11.0       SG  27.0    6-7   220.0        Ohio State  3425510.0  
13      James Young  New Boston Celtics    13.0       SG  20.0    6-6   215.0          Kentucky  1749840.0 
14     Tyler Zeller  New Boston Celtics    44.0        C  26.0    7-0   253.0    North Carolina  2616975.0

Python | Pandas Series.str.replace() to replace text in a series

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US