Friday, December 27, 2024
Google search engine
HomeLanguagesPython Pandas – get_dummies() method

Python Pandas – get_dummies() method

pandas.get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.

syntax:  pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Parameters:

  • data: whose data is to be manipulated.
  • prefix: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Default value is None.
  • prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’
  • dummy_na: It adds a column to indicate NaN values, default value is false, If false NaNs are ignored.
  • columns: Column names in the DataFrame that needs to be encoded. Default value is None, If columns is None then all the columns with object or category dtype will be converted.
  • sparse: It  specify whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). default value is False.
  • drop_first: Remove first level to get k-1 dummies out of k categorical levels.
  • dtype: Data type for new columns. Only a single dtype is allowed. Default value is np.uint8.

Returns: Dataframe (Dummy-coded data)

Example 1:

Python3




import pandas as pd
 
con = pd.Series(list('abcba'))
print(pd.get_dummies(con))


 
 Output:

Output 

 Example 2:

Python




import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li))


Output:

Nan column is not there as dummy_na is False by default

Example 3: (To get NaN column)

Python




import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li, dummy_na=True))


Output:

Example 4:

Python3




import pandas as pd
import numpy as np
 
 
# dictionary
diff = pd.DataFrame({'R': ['a', 'c', 'd'],
                     'T': ['d', 'a', 'c'],
                     'S_': [1, 2, 3]})
 
print(pd.get_dummies(diff, prefix=['column1', 'column2']))


Output:

RELATED ARTICLES

Most Popular

Recent Comments