Saturday, November 16, 2024
Google search engine
HomeLanguagesPython | Pandas Series.str.replace() to replace text in a series

Python | Pandas Series.str.replace() to replace text in a series

Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Pandas Series.str.replace() method works like Python .replace() method only, but it works on Series too. Before calling .replace() on a Pandas series, .str has to be prefixed in order to differentiate it from Python’s default replace method.

Syntax:
Series.str.replace(pat, repl, n=-1, case=None, regex=True)

Parameters:
pat: string or compiled regex to be replaced
repl: string or callable to replace instead of pat
n: Number of replacements to make in a single string, default is -1 which means all.
case: Takes boolean value to decide case sensitivity. Make false for case insensitivity
regex: Boolean value, if True assume that the passed pattern is a regex

Return Type:
Series with replaced text values

Example: The .str.replace() method is a part of the Pandas String Handling capabilities. This let users to replace occurrences of a specified substring with another substring in text data contained within a Pandas Series. This feature is particularly useful when performing data cleaning, transformation, and preparation tasks, as it simplifies the process of altering text content in large datasets.

Python3




import pandas as pd
 
data = {'text': ['Blue', 'Green', 'Red']}
df = pd.DataFrame(data)
 
df['text'] = df['text'].str.replace(' ', '_')
print(df)


Output:

          text 
0 Blue
1 Green
2 Red

Now we will see the example of using str.replace() on dataset. In the following examples, the data frame used contains data of some NBA players. To download the CSV used in code, click here. Let’s load the dataset and see how it looks.

Python3




# importing pandas module
import pandas as pd
 
# reading csv file from url
 
#printing first 5 rows
print(data.head())


Output:

            Name            Team  Number Position   Age Height  Weight            College     Salary
0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University NaN
3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 NaN 5000000.0

Example 1: Replacing values in age column In this example, all the values in age column having value 25.0 are replaced with “Twenty five” using str.replace() After that, a filter is created and passed in .where() method to only display the rows which have Age = “Twenty five”.

Python3




# overwriting column with replaced value of age
data["Age"]= data["Age"].replace(25.0, "Twenty five")
 
# creating a filter for age column
# where age = "Twenty five"
filter = data["Age"]=="Twenty five"
 
# printing only filtered columns
data.where(filter).dropna()


Output: As shown in the output, all the values in Age column having age=25.0 have been replaced by “Twenty five”.   

                    Name                    Team  Number Position         Age Height  Weight               College      Salary 
0 Avery Bradley Boston Celtics 0.0 PG Twenty five 6-2 180.0 Texas 7730337.0
1 Jae Crowder Boston Celtics 99.0 SF Twenty five 6-6 235.0 Marquette 6796117.0
7 Kelly Olynyk Boston Celtics 41.0 C Twenty five 7-0 238.0 Gonzaga 2165160.0
26 Thomas Robinson Brooklyn Nets 41.0 PF Twenty five 6-10 237.0 Kansas 981348.0
35 Cleanthony Early New York Knicks 11.0 SF Twenty five 6-8 210.0 Wichita State 845059.0
44 Derrick Williams New York Knicks 23.0 PF Twenty five 6-8 240.0 Arizona 4000000.0
47 Isaiah Canaan Philadelphia 76ers 0.0 PG Twenty five 6-0 201.0 Murray State 947276.0
48 Robert Covington Philadelphia 76ers 33.0 SF Twenty five 6-9 215.0 Tennessee State 1000000.0

Example 2: Case Insensitivity In this example, team name Boston Celtics is replaced by New Boston Celtics. In the parameters, instead of passing Boston, boston is passed (with ‘b’ in lower case) and the case is set to False, which means case insensitive. After that only teams having team name “New Boston Celtics” are displayed using .where() method. 

Python3




# importing pandas module
import pandas as pd
 
# reading csv file from url
 
# overwriting column with replaced value of age
data["Team"]= data["Team"].str.replace("boston", "New Boston", case = False)
 
# creating a filter for age column
# where age = "Twenty five"
filter = data["Team"]=="New Boston Celtics"
 
# printing only filtered columns
data.where(filter).dropna()


Output: As shown in the output, Boston is replaced by New Boston irrespective of the lower case passed in the parameters. This is because the case parameter was set to False. 

               Name                Team  Number Position   Age Height  Weight           College     Salary
0 Avery Bradley New Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
1 Jae Crowder New Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
3 R.J. Hunter New Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
6 Jordan Mickey New Boston Celtics 55.0 PF 21.0 6-8 235.0 LSU 1170960.0
7 Kelly Olynyk New Boston Celtics 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0
8 Terry Rozier New Boston Celtics 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0
9 Marcus Smart New Boston Celtics 36.0 PG 22.0 6-4 220.0 Oklahoma State 3431040.0
10 Jared Sullinger New Boston Celtics 7.0 C 24.0 6-9 260.0 Ohio State 2569260.0
11 Isaiah Thomas New Boston Celtics 4.0 PG 27.0 5-9 185.0 Washington 6912869.0
12 Evan Turner New Boston Celtics 11.0 SG 27.0 6-7 220.0 Ohio State 3425510.0
13 James Young New Boston Celtics 13.0 SG 20.0 6-6 215.0 Kentucky 1749840.0
14 Tyler Zeller New Boston Celtics 44.0 C 26.0 7-0 253.0 North Carolina 2616975.0

RELATED ARTICLES

Most Popular

Recent Comments