In this article, we will see how to select columns with specific data types from a dataframe. This operation can be performed using the DataFrame.select_dtypes() method in pandas module.
Syntax: DataFrame.select_dtypes(include=None, exclude=None)
Parameters :
include, exclude : A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.
Return : The subset of the frame including the dtypes in include and excluding the dtypes in exclude.
Step-by-step Approach:
- First, import modules then load the dataset.
Python3
# import required module import pandas as pd # assign dataset df = pd.read_csv( "train.csv" ) |
- Then we will find types of data present in our dataset using dataframe.info() method.
Python3
# display description # of the dataset df.info() |
Output:
- Now, we will use DataFrame.select_dtypes() to select a specific datatype.
Python3
# store columns with specific data type integer_columns = df.select_dtypes(include = [ 'int64' ]).columns float_columns = df.select_dtypes(include = [ 'float64' ]).columns object_columns = df.select_dtypes(include = [ 'object' ]).columns |
- Finally, display the column having a particular data type.
Python3
# display columns print ( '\nint64 columns:\n' , integer_columns) print ( '\nfloat64 columns:\n' , float_columns) print ( '\nobject columns:\n' , object_columns) |
Output:
Below is the complete program based on the above approach:
Python3
# import required module import pandas as pd # assign dataset df = pd.read_csv( "train.csv" ) # store columns with specific data type integer_columns = df.select_dtypes(include = [ 'int64' ]).columns float_columns = df.select_dtypes(include = [ 'float64' ]).columns object_columns = df.select_dtypes(include = [ 'object' ]).columns # display columns print ( '\nint64 columns:\n' ,integer_columns) print ( '\nfloat64 columns:\n' ,float_columns) print ( '\nobject columns:\n' ,object_columns) |
Output:
Example:
Here we are going to extract columns of the below dataset:
Python3
# import required module import pandas as pd from vega_datasets import data # assign dataset df = data.seattle_weather() # display dataset df.sample( 10 ) |
Output:
Now, we are going to display all the columns having float64 as the data type.
Python3
# import required module import pandas as pd from vega_datasets import data # assign dataset df = data.seattle_weather() # display description # of dataset df.info() # store columns with specific data type columns = df.select_dtypes(include = [ 'float64' ]).columns # display columns print ( '\nColumns:\n' , columns) |
Output: