Pandas remove rows with special characters

27 July 2024

3

In this article we will learn how to remove the rows with special characters i.e; if a row contains any value which contains special characters like @, %, &, $, #, +, -, *, /, etc. then drop such row and modify the data. To drop such types of rows, first, we have to search rows having special characters per column and then drop. To search we use regular expression either [@#&$%+-/*] or [^0-9a-zA-Z]. Let’s discuss the whole procedure with some examples :

Example 1:

This example consists of some parts with code and the dataframe used can be download by clicking data1.csv or shown below.

Python3

# importing package
import pandas as pd
 
# load dataset
df = pd.read_csv("data1.csv")
 
# view dataset
print(df)

Output:

Select rows with columns having special characters value

Python3

# select the rows 
# if Name column
# has special characters
print(df[df.Name.str.contains(r'[@#&$%+-/*]')])

Output:

Python3

# select the rows 
# if Grade column
# has special characters
print(df[df.Grade.str.contains(r'[^0-9a-zA-Z]')])

Output:

Merging of selected rows

Python3

# merge the selected rows
# by using or
print(df[df.Name.str.contains(r'[^0-9a-zA-Z]')
         | df.Grade.str.contains(r'[@#&$%+-/*]')])

Output:

Remove the merged selected rows

Python3

# drop the merged selected rows
print(df.drop(df[df.Name.str.contains(r'[^0-9a-zA-Z]')
                 | df.Grade.str.contains(r'[^0-9a-zA-Z]')].index))

Output:

Example 2: This example uses a dataframe which can be download by clicking data2.csv or shown below :

Python3

# importing package
import pandas as pd
 
# load dataset
df = pd.read_csv("data2.csv")
 
# view dataset
print(df)
 
# select and then merge rows 
# with special characters
print(df[df.ID.str.contains(r'[^0-9a-zA-Z]') | 
         df.Name.str.contains(r'[^0-9a-zA-Z]') | 
         df.Age.str.contains(r'[^0-9a-zA-Z]') | 
         df.Country.str.contains(r'[^0-9a-zA-Z]')])
 
# drop the rows
print(df.drop(df[df.ID.str.contains(r'[^0-9a-zA-Z]') | 
                 df.Name.str.contains(r'[^0-9a-zA-Z]') | 
                 df.Age.str.contains(r'[^0-9a-zA-Z]') | 
                 df.Country.str.contains(r'[^0-9a-zA-Z]')].index))

Output :

Pandas remove rows with special characters

Python3

Python3

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Recent Comments

EDITOR PICKS

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

POPULAR POSTS

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

POPULAR CATEGORY

ABOUT US

FOLLOW US