In this article, we are going to see how to Perform a Chi-Square Goodness of Fit Test in Python
The Chi-Square Goodness of fit test is a non-parametric statistical hypothesis test that’s used to determine how considerably the observed value of an event differs from the expected value. it helps us check whether a variable comes from a certain distribution or if a sample represents a population. The observed probability distribution is compared with the expected probability distribution.
null hypothesis: A variable has a predetermined distribution.
Alternative hypotheses: A variable deviates from the expected distribution.
Example 1: Using stats.chisquare() function
In this approach we use stats.chisquare() method from the scipy.stats module which helps us determine chi-square goodness of fit statistic and p-value.
Syntax: stats.chisquare(f_obs, f_exp)
parameters:
- f_obs : this parameter contains an array of observed values.
- f_exp : this parameter contains an array of expected values.
In the below example we also use the stats.ppf() method which takes the parameters level of significance and degrees of freedom as input and gives us the value of chi-square critical value. if chi_square_ value > critical value, the null hypothesis is rejected. if chi_square_ value <= critical value, the null hypothesis is accepted. in the below example chi_square value is 5.0127344877344875 and the critical value is 12.591587243743977. As chi_square_ value <=, critical_value null hypothesis is accepted and the alternative hypothesis is rejected.
Python3
# importing packages import scipy.stats as stats import numpy as np # no of hours a student studies # in a week vs expected no of hours observed_data = [ 8 , 6 , 10 , 7 , 8 , 11 , 9 ] expected_data = [ 9 , 8 , 11 , 8 , 10 , 7 , 6 ] # Chi-Square Goodness of Fit Test chi_square_test_statistic, p_value = stats.chisquare( observed_data, expected_data) # chi square test statistic and p value print ( 'chi_square_test_statistic is : ' + str (chi_square_test_statistic)) print ( 'p_value : ' + str (p_value)) # find Chi-Square critical value print (stats.chi2.ppf( 1 - 0.05 , df = 6 )) |
Output:
chi_square_test_statistic is : 5.0127344877344875 p_value : 0.542180861413329 12.591587243743977
Example 2: Determining chi-square test statistic by implementing formula
In this approach, we directly implement the formula. we can see that we get the same values of chi_square.
Python3
# importing packages import scipy.stats as stats import numpy as np # no of hours a student studies # in a week vs expected no of hours observed_data = [ 8 , 6 , 10 , 7 , 8 , 11 , 9 ] expected_data = [ 9 , 8 , 11 , 8 , 10 , 7 , 6 ] # determining chi square goodness of fit using formula chi_square_test_statistic1 = 0 for i in range ( len (observed_data)): chi_square_test_statistic1 = chi_square_test_statistic1 + \ (np.square(observed_data[i] - expected_data[i])) / expected_data[i] print ( 'chi square value determined by formula : ' + str (chi_square_test_statistic1)) # find Chi-Square critical value print (stats.chi2.ppf( 1 - 0.05 , df = 6 )) |
Output:
chi square value determined by formula : 5.0127344877344875 12.591587243743977