Paired sample T-test: This test is also known as the dependent sample t-test. It is a statistical concept and is used to check whether the mean difference between the two sets of observation is equal to zero. Each entity is measured is two times in this test that results in the pairs of observations.
Syntax to install Scipy library in our system :
pip install scipy
How to conduct a paired samples T-Test in Python?
Let us consider that we want to know whether an engine oil significantly impacts the car’s mileage of different brands. In order to test this, we have 10 cars in a garage doped with original engine oil initially. We have noted their mileage for 100 kilometers each. Then, we have each of the cars doped with another engine oil (different from the original one). Then, the mileage of the cars is calculated for 100 kilometers each. To compare the difference between the mean mileage of the first and second test, we use a paired samples t-test because for each car their first test score can be paired with their second test score. Conducting paired sample T-test is a step-by-step process.
Step 1: Construct the data.
We need two arrays to hold pre and post-mileage of the cars.
Python3
# pre holds the mileage before applying # the different engine oil pre = [ 88 , 82 , 84 , 93 , 75 , 78 , 84 , 87 , 95 , 91 , 83 , 89 , 77 , 68 , 91 ] # post holds the mileage before applying # the different engine oil post = [ 91 , 84 , 88 , 90 , 79 , 80 , 88 , 90 , 90 , 96 , 88 , 89 , 81 , 74 , 92 ] |
Step 2: Conducting a paired-sample T-test.
Scipy library contains ttest_rel() function using which we can conduct the paired samples t-test in Python. The syntax is given below,
Syntax:
ttest_rel(arr1, arr2)
Parameters:
- arr1: It represents an array of sample observations from group 1
- arr2: It represents an array of sample observations from group 2
Example:
Python3
# Importing library import scipy.stats as stats # pre holds the mileage before # applying the different engine oil pre = [ 30 , 31 , 34 , 40 , 36 , 35 , 34 , 30 , 28 , 29 ] # post holds the mileage after # applying the different engine oil post = [ 30 , 31 , 32 , 38 , 32 , 31 , 32 , 29 , 28 , 30 ] # Performing the paired sample t-test stats.ttest_rel(pre, post) |
Output:
The test statistic comes out to be equal to 2.584 and the corresponding two-sided p-value is 0.029.
Step 3: Analyzing the output.
The paired samples t-test follows the null and alternative hypotheses:
- H0: It signifies that the mean pre-test and post-test scores are equal
- HA: It signifies that the mean pre-test and post-test scores are not equal
As the p-value comes out to be equal to 0.029 which is less than 0.05 hence we reject the null hypothesis. So, we have enough proof to claim that the true mean test score is different for cars before and after applying the different engine oil.