Series.str
can be used to access the values of the series as strings and apply several methods to it. Pandas Series.str.extractall()
function is used to extract capture groups in the regex pat as columns in a DataFrame. For each subject string in the Series, extract groups from all matches of regular expression pat. When each subject string in the Series has exactly one match, extractall(pat).xs(0, level=’match’) is the same as extract(pat).
Syntax: Series.str.extractall(pat, flags=0)
Parameter :
pat : Regular expression pattern with capturing groups.
flags : A re module flag, for example re.IGNORECASE.Returns : DataFrame
Example #1: Use Series.str.extractall()
function to extract all groups from the string in the underlying data of the given series object.
# importing pandas as pd import pandas as pd # importing re for regular expressions import re # Creating the Series sr = pd.Series([ 'New_York' , 'Lisbon' , 'Tokyo' , 'Paris' , 'Munich' ]) # Creating the index idx = [ 'City 1' , 'City 2' , 'City 3' , 'City 4' , 'City 5' ] # set the index sr.index = idx # Print the series print (sr) |
Output :
Now we will use Series.str.extractall()
function to extract all groups from the strings in the given series object.
# extract all groups having a vowel followed by # any character result = sr. str .extractall(pat = '([aeiou].)' ) # print the result print (result) |
Output :
As we can see in the output, the Series.str.extractall()
function has returned a dataframe containing a column of all the extracted group.
Example #2 : Use Series.str.extractall()
function to extract all groups from the string in the underlying data of the given series object.
# importing pandas as pd import pandas as pd # importing re for regular expressions import re # Creating the Series sr = pd.Series([ 'Mike' , 'Alessa' , 'Nick' , 'Kim' , 'Britney' ]) # Creating the index idx = [ 'Name 1' , 'Name 2' , 'Name 3' , 'Name 4' , 'Name 5' ] # set the index sr.index = idx # Print the series print (sr) |
Output :
Now we will use Series.str.extractall()
function to extract all groups from the strings in the given series object.
# extract all groups having any capital letter # followed by 'i' and any other character result = sr. str .extractall(pat = '([A-Z]i.)' ) # print the result print (result) |
Output :
As we can see in the output, the Series.str.extractall()
function has returned a dataframe containing a column of all the extracted group.