Sunday, November 17, 2024
Google search engine
HomeLanguagesPython | Pandas Series.str.get_dummies()

Python | Pandas Series.str.get_dummies()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas str.get_dummies() is used to separate each string in the caller series at the passed separator. A data frame is returned with all the possible values after splitting every string. If the text value in original data frame at same index contains the string (Column name/ Splitted values) then the value at that position is 1 otherwise, 0.

Since this is a string operation, .str has to be prefixed every time before calling this function. Otherwise, it will throw an error.

Syntax: Series.str.get_dummies(sep=’|’)

Parameters:
sep: String value, separator to split strings at

Return type: Data frame with binary values only

To download the data set used in following examples, click here.

In the following examples, the data frame used contains data of some employees. The image of data frame before any operations is attached below.

 
Example #1: Separating different strings on whitespace.

In this example, string in the Team column have been split at ” ” (White-space) and the data frame is returned with all possible values after splitting. The value in returned data frame is 1 if the string(Column name) exists in the text value at same index in old data frame.

Python3




# importing pandas
import pandas as pd
  
# making data frame from csv at url
  
# making dataframe using get_dummies()
dummies = data["Team"].str.get_dummies(" ")
  
# display
dummies.head(10)


Output:
As shown in the output image, it can be compared with the original image of data frame. If the string exists at that same index, then value is 1 otherwise 0.

 
Important points:

  • If string is not null, then at least one column will have value 1 at the same index.
  • If the value is null, then all columns will have 0 value at that index (Can be seen at 2nd element in above example)
  •  
    Example #2: Splitting at multiple points/Static value column

    In this example, a static value is taken for the new column (“Hello gfg family”). Then the get_dummies() method is applied and the string is separated at “g”. Since “g” is occurring more than once, there will be more than one column and also the values in all column must be same as the string is also same for all rows.

    Python3




    # importing pandas
    import pandas as pd
      
    # making data frame from csv at url
      
    # string for new column
    string ="Hello gfg family"
      
    # creating new column
    data["New_column"]= string
      
    # creating dummies
    df = data["New_column"].str.get_dummies("g")
      
    # display
    df.head(10)

    
    

    Output:
    As shown in output image, the new data frame has 3 columns and every row has same values.

    RELATED ARTICLES

    Most Popular

    Recent Comments