Repeated measures ANOVA in Python is used to find whether there is a statistically significant difference exists between the means of three or more groups in which the same subjects displayed in each group.
Hypothesis:
A repeated-measures ANOVA has the following null and alternative hypotheses involved:
- The null hypothesis (H0): µ1 = µ2 = µ3 (In other words, population means are equal)
- The alternative hypothesis: (Ha): According to it, at least one population mean differs from the rest
Perform a repeated-measures ANOVA in Python:
Let us consider an example, researchers are curious to know if four different engine oils lead to different mileage of cars. In order to test this, they measured the mileage of 5 cars using four different engine oils. Since each car’s mileage is measured by applying each of the four-engine oils one by one so we can use a repeated-measures ANOVA to check if the mean reaction time differs between drugs.
Syntax to install numpy, pandas and and statsmodels library:
pip3 install numpy pandas statsmodels
Performing the repeated measures ANOVA in Python is a step-by-step process. These steps are explained below.
Step 1: Create the data
Python3
# Import the library import numpy as np import pandas as pd # Create the data dataframe = pd.DataFrame({ 'Cars' : np.repeat([ 1 , 2 , 3 , 4 , 5 ], 4 ), 'Engine Oil' : np.tile([ 1 , 2 , 3 , 4 ], 5 ), 'Mileage' : [ 36 , 38 , 30 , 29 , 34 , 38 , 30 , 29 , 34 , 28 , 38 , 32 , 38 , 34 , 20 , 44 , 26 , 28 , 34 , 50 ]}) # Print the dataframe print (dataframe) |
Output:
Step 2: Conduct the repeated measures ANOVA.
Python provides us AnovaRM() function from the statsmodels library to measure repeated measures ANOVA.
Example:
Python3
# Import library import numpy as np import pandas as pd from statsmodels.stats.anova import AnovaRM # Create the data dataframe = pd.DataFrame({ 'Cars' : np.repeat([ 1 , 2 , 3 , 4 , 5 ], 4 ), 'Oil' : np.tile([ 1 , 2 , 3 , 4 ], 5 ), 'Mileage' : [ 36 , 38 , 30 , 29 , 34 , 38 , 30 , 29 , 34 , 28 , 38 , 32 , 38 , 34 , 20 , 44 , 26 , 28 , 34 , 50 ]}) # Conduct the repeated measures ANOVA print (AnovaRM(data = dataframe, depvar = 'Mileage' , subject = 'Cars' , within = [ 'Oil' ]).fit()) |
Output:
Step 3: Analyse the results.
In this example, the F test-statistic comes out to be equal to 0.5679 and the corresponding p-value is 0.6466. Since this p-value is not less than 0.05, we cannot reject the null hypothesis and conclude that there is a not statistically significant difference in mean response times between the four-engine oils.
Step 4: Report the outcome.
Let us report the result now: A one-way repeated measures ANOVA is conducted on 5 individuals to interpret the effect of four different engine oils on the mileage. Results showed that the type of drug used led to statistically significant differences in response time (F(3, 12) = 0.5679, p < 0.6466).