Saturday, December 28, 2024
Google search engine
HomeLanguagesConversion Functions in Pandas DataFrame

Conversion Functions in Pandas DataFrame

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. In this article, we are using “nba.csv” file to download the CSV, click here.

Cast a pandas object to a specified dtype

DataFrame.astype() function is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type.

Code #1: Convert the Weight column data type.




# importing pandas as pd
import pandas as pd
  
# Making data frame from the csv file
df = pd.read_csv("nba.csv")
  
# Printing the first 10 rows of 
# the data frame for visualization
  
df[:10]


As the data have some “nan” values so, to avoid any error we will drop all the rows containing any nan values.




# drop all those rows which 
# have any 'nan' value in it.
df.dropna(inplace = True)





# let's find out the data type of Weight column
before = type(df.Weight[0])
  
# Now we will convert it into 'int64' type.
df.Weight = df.We<strong>ight.astype('int64')
  
# let's find out the data type after casting
after = type(df.Weight[0])
  
# print the value of before
before
  
# print the value of after
after


Output:




# print the data frame and see
# what it looks like after the change
df


 

Infer better data type for input object column

DataFrame.infer_objects() function attempts to infer better data type for input object column. This function attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Code #1: Use infer_objects() function to infer better data type.




# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":["sofia", 5, 8, 11, 100],
                   "B":[2, 8, 77, 4, 11],
                   "C":["amy", 11, 4, 6, 9]})
  
# Print the dataframe
print(df)


Output :

Let’s see the dtype (data type) of each column in the dataframe.




# to print the basic info
df.info()


As we can see in the output, first and third column is of object type. whereas the second column is of int64 type. Now slice the dataframe and create a new dataframe from it.




# slice from the 1st row till end
df_new = df[1:]
  
# Let's print the new data frame
df_new
  
# Now let's print the data type of the columns
df_new.info()


Output :

As we can see in the output, column “A” and “C” are of object type even though they contain integer value. So, let’s try the infer_objects() function.




# applying infer_objects() function.
df_new = df_new.infer_objects()
  
# Print the dtype after applying the function
df_new.info()


Output :

Now, if we look at the dtype of each column, we can see that the column “A” and “C” are now of int64 type.
 

Detect missing values

DataFrame.isna() function is used to detect missing values. It return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ” or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Code #1: Use isna() function to detect the missing values in a dataframe.




# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# Print the dataframe
df


Lets use the isna() function to detect the missing values.




# detect the missing values
df.isna()


Output :

In the output, cells corresponding to the missing values contains true value else false.
 

Detecting existing/non-missing values

DataFrame.notna() function detects existing/ non-missing values in the dataframe. The function returns a boolean object having the same size as that of the object on which it is applied, indicating whether each individual value is a na value or not. All of the non-missing values gets mapped to true and missing values get mapped to false.

Code #1: Use notna() function to find all the non-missing value in the dataframe.




# importing pandas as pd
import pandas as pd
  
# Creating the first dataframe 
df = pd.DataFrame({"A":[14, 4, 5, 4, 1],
                   "B":[5, 2, 54, 3, 2], 
                   "C":[20, 20, 7, 3, 8],
                   "D":[14, 3, 6, 2, 6]})
  
# Print the dataframe
print(df)


Let’s use the dataframe.notna() function to find all the non-missing values in the dataframe.




# find non-na values
df.notna()


Output :

As we can see in the output, all the non-missing values in the dataframe has been mapped to true. There is no false value as there is no missing value in the dataframe.
 

Methods for conversion in DataFrame

Function Description
DataFrame.convert_objects() Attempt to infer better dtype for object columns.
DataFrame.copy() Return a copy of this object’s indices and data.
DataFrame.bool() Return the bool of a single element PandasObject.
RELATED ARTICLES

Most Popular

Recent Comments