Monday, November 25, 2024
Google search engine
HomeLanguagesHow to Conduct a Paired Samples T-Test in Python

How to Conduct a Paired Samples T-Test in Python

Paired sample T-test: This test is also known as the dependent sample t-test. It is a statistical concept and is used to check whether the mean difference between the two sets of observation is equal to zero.  Each entity is measured is two times in this test that results in the pairs of observations.

Syntax to install Scipy library in our system :

pip install scipy

How to conduct a paired samples T-Test in Python?

Let us consider that we want to know whether an engine oil significantly impacts the car’s mileage of different brands. In order to test this, we have 10 cars in a garage doped with original engine oil initially. We have noted their mileage for 100 kilometers each. Then, we have each of the cars doped with another engine oil (different from the original one). Then, the mileage of the cars is calculated for 100 kilometers each. To compare the difference between the mean mileage of the first and second test, we use a paired samples t-test because for each car their first test score can be paired with their second test score. Conducting paired sample T-test is a step-by-step process.

Step 1: Construct the data.

We need two arrays to hold pre and post-mileage of the cars.

Python3




# pre holds the mileage before applying
# the different engine oil
pre = [88, 82, 84, 93, 75, 78, 84, 87,
       95, 91, 83, 89, 77, 68, 91]
  
# post holds the mileage before applying 
# the different engine oil
post = [91, 84, 88, 90, 79, 80, 88, 90
        90, 96, 88, 89, 81, 74, 92]


Step 2: Conducting a paired-sample T-test.

Scipy library contains ttest_rel() function using which we can conduct the paired samples t-test in Python. The syntax is given below,

Syntax:

ttest_rel(arr1, arr2)

Parameters:

  • arr1: It represents an array of sample observations from group 1
  • arr2: It represents an array of sample observations from group 2

Example:

Python3




# Importing library
import scipy.stats as stats
  
# pre holds the mileage before 
# applying the different engine oil
pre = [30, 31, 34, 40, 36, 35,
       34, 30, 28, 29]
  
# post holds the mileage after 
# applying the different engine oil
post = [30, 31, 32, 38, 32, 31,
        32, 29, 28, 30]
  
# Performing the paired sample t-test
stats.ttest_rel(pre, post)


Output:

Output

The test statistic comes out to be equal to 2.584 and the corresponding two-sided p-value is 0.029.

Step 3: Analyzing the output.

The paired samples t-test follows the null and alternative hypotheses:

  • H0: It signifies that the mean pre-test and post-test scores are equal
  • HA: It signifies that the mean pre-test and post-test scores are not equal

As the p-value comes out to be equal to 0.029 which is less than 0.05 hence we reject the null hypothesis. So, we have enough proof to claim that the true mean test score is different for cars before and after applying the different engine oil.

RELATED ARTICLES

Most Popular

Recent Comments