In this article we will learn how to remove the rows with special characters i.e; if a row contains any value which contains special characters like @, %, &, $, #, +, -, *, /, etc. then drop such row and modify the data. To drop such types of rows, first, we have to search rows having special characters per column and then drop. To search we use regular expression either [@#&$%+-/*] or [^0-9a-zA-Z]. Let’s discuss the whole procedure with some examples :
Example 1:
This example consists of some parts with code and the dataframe used can be download by clicking data1.csv or shown below.
Python3
# importing package import pandas as pd # load dataset df = pd.read_csv( "data1.csv" ) # view dataset print (df) |
Output:
Select rows with columns having special characters value
Python3
# select the rows # if Name column # has special characters print (df[df.Name. str .contains(r '[@#&$%+-/*]' )]) |
Output:
Python3
# select the rows # if Grade column # has special characters print (df[df.Grade. str .contains(r '[^0-9a-zA-Z]' )]) |
Output:
Merging of selected rows
Python3
# merge the selected rows # by using or print (df[df.Name. str .contains(r '[^0-9a-zA-Z]' ) | df.Grade. str .contains(r '[@#&$%+-/*]' )]) |
Output:
Remove the merged selected rows
Python3
# drop the merged selected rows print (df.drop(df[df.Name. str .contains(r '[^0-9a-zA-Z]' ) | df.Grade. str .contains(r '[^0-9a-zA-Z]' )].index)) |
Output:
Example 2: This example uses a dataframe which can be download by clicking data2.csv or shown below :
Python3
# importing package import pandas as pd # load dataset df = pd.read_csv( "data2.csv" ) # view dataset print (df) # select and then merge rows # with special characters print (df[df. ID . str .contains(r '[^0-9a-zA-Z]' ) | df.Name. str .contains(r '[^0-9a-zA-Z]' ) | df.Age. str .contains(r '[^0-9a-zA-Z]' ) | df.Country. str .contains(r '[^0-9a-zA-Z]' )]) # drop the rows print (df.drop(df[df. ID . str .contains(r '[^0-9a-zA-Z]' ) | df.Name. str .contains(r '[^0-9a-zA-Z]' ) | df.Age. str .contains(r '[^0-9a-zA-Z]' ) | df.Country. str .contains(r '[^0-9a-zA-Z]' )].index)) |
Output :