In this article, we will discuss how to calculate the correlation between two columns in pandas
Correlation is used to summarize the strength and direction of the linear association between two quantitative variables. It is denoted by r and values between -1 and +1. A positive value for r indicates a positive association, and a negative value for r indicates a negative association.
By using corr() function we can get the correlation between two columns in the dataframe.
Syntax:
dataframe[‘first_column’].corr(dataframe[‘second_column’])
where,
- dataframe is the input dataframe
- first_column is correlated with second_column of the dataframe
Example 1: Python program to get the correlation among two columns
Python3
# import pandas module import pandas as pd # create dataframe with 3 columns data = pd.DataFrame({ "column1" : [ 12 , 23 , 45 , 67 ], "column2" : [ 67 , 54 , 32 , 1 ], "column3" : [ 34 , 23 , 56 , 23 ] } ) # display dataframe print (data) # correlation between column 1 and column2 print (data[ 'column1' ].corr(data[ 'column2' ])) # correlation between column 2 and column3 print (data[ 'column2' ].corr(data[ 'column3' ])) # correlation between column 1 and column3 print (data[ 'column1' ].corr(data[ 'column3' ])) |
Output:
column1 column2 column3 0 12 67 34 1 23 54 23 2 45 32 56 3 67 1 23 -0.9970476685163736 0.07346999975265099 0.0
It is also possible to get element-wise correlation for numeric valued columns using just corr() function.
Syntax:
dataset.corr()
Example 2: Get the element-wise correlation
Python3
# import pandas module import pandas as pd # create dataframe with 3 columns data = pd.DataFrame({ "column1" : [ 12 , 23 , 45 , 67 ], "column2" : [ 67 , 54 , 32 , 1 ], "column3" : [ 34 , 23 , 56 , 23 ] } ) # get correlation between element wise print (data.corr()) |
Output:
column1 column2 column3 column1 1.000000 -0.997048 0.00000 column2 -0.997048 1.000000 0.07347 column3 0.000000 0.073470 1.00000