Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Pandas str.index()
method is used to search and return lowest index of a substring in particular section (Between start and end) of every string in a series. This method works in a similar way to str.find() but on not found case, instead of returning -1, str.index() gives a ValueError.
Syntax: Series.str.index(sub, start=0, end=None)
Parameters:
sub: String or character to be searched in the text value in series
start: String or character to be searched in the text value in series
end: String or character to be searched in the text value in seriesReturn type: Series with least index of substring if found.
To download the data set used in following example, click here.
In the following examples, the data frame used contains data of some NBA players. The image of data frame before any operations is attached below.
Example #1: Finding index when substring exists in every string
In this example, ‘e’ is passed as substring. Since ‘e’ exists in all 5 strings, least index of it’s occurrence is returned. Before applying any operations, null rows were removed using .dropna() method.
# importing pandas module import pandas as pd # reading csv file from url # dropping null value columns to avoid errors data.dropna(inplace = True ) # extracting 5 rows short_data = data.head().copy() # calling str.index() method short_data[ "Index Name" ] = short_data[ "Name" ]. str .index( "e" ) # display short_data |
Output:
As shown in the output image, the least index of ‘e’ in series was returned and stored in new column.
Example #2:
In this example, ‘a’ is searched in top 5 rows. Since ‘a’ doesn’t exist in every string, value error will be returned. To handle error, try and except is used.
# importing pandas module import pandas as pd # reading csv file from url # dropping null value columns to avoid errors data.dropna(inplace = True ) # extracting 5 rows short_data = data.head().copy() # calling str.index() method try : short_data[ "Index Name" ] = short_data[ "Name" ]. str .index( "a" ) except Exception as err: print (err) # display short_data |
Output:
As shown in output image, the output data frame is not having the Index Name column and the error “substring not found” was printed. That is because str.index() returns valueError on not found and hence it must have gone to except case and printed the error.