ANCOVA (Analysis of Covariance) is used to identify the statistical difference between means of 2 or more independent groups after controlling one or more explanatory variables (Covariates). Variables that influence a response variable but are irrelevant to the study are known as covariates.
- The independent variable or a predictor variable that explains the variation in the response variable (output variable) is known as Explanatory Variable.
- The dependent variable or an outcome variable that responds to the changes in the explanatory variable is known as Response Variable.
Applying ANCOVA function
Example: A tutor wants to know if three distinct teaching and learning methodologies have an adverse effect on test scores, but she also wants to account for the student’s current grade in the class. She’ll run an ANCOVA with the following variables:
- Learning methodologies is a Factor Variable.
- Current grade is a Covariate.
- Test score is a Response variable.
Steps to perform ANCOVA
Step 1: Create a Pandas Data frame to hold the data for performing ANCOVA.
Python
import numpy as np import pandas as pd # create data data = pd.DataFrame({ 'methodology' : np.repeat([ 'A' , 'B' , 'C' ], 4 ), 'current_grade' : [ 67 , 88 , 75 , 85 , 92 , 77 , 74 , 88 , 91 , 88 , 82 , 80 ], 'test_score' : [ 77 , 89 , 74 , 69 , 88 , 93 , 94 , 90 , 85 , 81 , 83 , 79 ]}) # view data data |
Step 2: Now perform ANCOVA using ancova() from pingouin library. Make sure you have installed pingouin library before using ANCOVA() as follows.
Syntax:
pip install pingouin
The above code will execute all the necessary libraries and modules of pingouin.
ancova() functions:
Syntax: pingouin.ancova(data=None, dv=None, between=None, covar=None, effsize=’np2′)
Pameters:
- Data : pandas data frame that is supplied to perform ANCOVA.
- DV : Dependent variable column name.
- Between : name of the column in data with Factor variable.
- Covar : name of the columns in data with covariate.
- effsize : Effect size.
Python
from pingouin import ancova data = pd.DataFrame({ 'methodology' : np.repeat([ 'A' , 'B' , 'C' ], 4 ), 'current_grade' : [ 67 , 88 , 75 , 85 , 92 , 77 , 74 , 88 , 91 , 88 , 82 , 80 ], 'test_score' : [ 77 , 89 , 74 , 69 , 88 , 93 , 94 , 90 , 85 , 81 , 83 , 79 ]}) ancova(data = data, dv = 'test_score' , covar = 'current_grade' , between = 'methodology' ) |
Output:
Step 3: Analyze the results obtained after performing ANCOVA.
ANCOVA() function after executing successfully it returns the following values.
- aov-pandas.DataFrame
- ANCOVA summary:
- ‘Source’: Names of the factor considered
- ‘SS’: Sums of squares
- ‘DF’: Degrees of freedom
- ‘F’: F-values
- ‘p-unc’: Uncorrected p-values
- ‘np2’: Partial eta-squared
According to the ANCOVA table, the p-value (p-unc = “uncorrected p-value”) for study methodology is 0.025542. Because this value is less than 0.05, we can reject the null hypothesis that each of the studying methodologies results in the same average test score, even after controlling for the student’s current grade in the class.