pandas.get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.
syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Parameters:
- data: whose data is to be manipulated.
- prefix: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Default value is None.
- prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’
- dummy_na: It adds a column to indicate NaN values, default value is false, If false NaNs are ignored.
- columns: Column names in the DataFrame that needs to be encoded. Default value is None, If columns is None then all the columns with object or category dtype will be converted.
- sparse: It specify whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). default value is False.
- drop_first: Remove first level to get k-1 dummies out of k categorical levels.
- dtype: Data type for new columns. Only a single dtype is allowed. Default value is np.uint8.
Returns: Dataframe (Dummy-coded data)
Example 1:
Python3
import pandas as pd con = pd.Series( list ( 'abcba' )) print (pd.get_dummies(con)) |
Output:
Example 2:
Python
import pandas as pd import numpy as np # list li = [ 's' , 'a' , 't' , np.nan] print (pd.get_dummies(li)) |
Output:
Example 3: (To get NaN column)
Python
import pandas as pd import numpy as np # list li = [ 's' , 'a' , 't' , np.nan] print (pd.get_dummies(li, dummy_na = True )) |
Output:
Example 4:
Python3
import pandas as pd import numpy as np # dictionary diff = pd.DataFrame({ 'R' : [ 'a' , 'c' , 'd' ], 'T' : [ 'd' , 'a' , 'c' ], 'S_' : [ 1 , 2 , 3 ]}) print (pd.get_dummies(diff, prefix = [ 'column1' , 'column2' ])) |
Output: