Heteroskedasticity is a statistical term and it is defined as the unequal scattering of residuals. More specifically it refers to a range of measured values the change in the spread of residuals. Heteroscedasticity possesses a challenge because ordinary least squares (OLS) regression considers the residuals thrown out from a population having homoscedasticity which means constant variance. If there is a heteroscedasticity present for a regression analysis then the outcome of the analysis cannot be trusted easily.
Breusch-Pagan test is a way to check whether heteroscedasticity exists in regression analysis. A Breusch-Pagan test follows the below hypotheses:
Hypothesis:
- The null hypothesis (H0): Signifies that Homoscedasticity is present.
- The alternative hypothesis: (Ha): Signifies that the Homoscedasticity is not present (i.e. heteroscedasticity exists)
Syntax to install the numNumPypy, pandas and statsmodels library:
pip3 install numpy pandas statsmodels
Performing a Breusch-Pegan Test:
Performing a Breusch-Pegan test is a step-by-step process. These have been discussed below.
Step 1: Import libraries.
The very first step is to import the libraries that we have installed above.
Python3
# Importing libraries import numpy as np import pandas as pd import statsmodels.formula.api as smf |
Step 2: Create a dataset.
Then we need to create a dataset.
Python3
# Create a dataset dataframe = pd.DataFrame({ 'rating' : [ 92 , 84 , 87 , 82 , 98 , 94 , 75 , 80 , 83 , 89 ], 'points' : [ 27 , 30 , 15 , 26 , 27 , 20 , 16 , 18 , 19 , 20 ], 'runs' : [ 5000 , 7000 , 5102 , 8019 , 1200 , 7210 , 6200 , 9214 , 4012 , 3102 ], 'wickets' : [ 110 , 120 , 110 , 80 , 90 , 119 , 116 , 100 , 90 , 76 ]}) |
Step 3: Fit a multiple linear regression model.
The next step is to fit a multiple linear regression model. As an example, we are considering rating as the response variable and points, runs, and wickets as the explanatory variables.
Python3
# Importing libraries import numpy as np import pandas as pd import statsmodels.formula.api as smf # Create a dataset dataframe = pd.DataFrame({ 'rating' : [ 92 , 84 , 87 , 82 , 98 , 94 , 75 , 80 , 83 , 89 ], 'points' : [ 27 , 30 , 15 , 26 , 27 , 20 , 16 , 18 , 19 , 20 ], 'runs' : [ 5000 , 7000 , 5102 , 8019 , 1200 , 7210 , 6200 , 9214 , 4012 , 3102 ], 'wickets' : [ 110 , 120 , 110 , 80 , 90 , 119 , 116 , 100 , 90 , 76 ]}) # fit regression model fit = smf.ols( 'rating ~ points+runs+wickets' , data = dataframe).fit() print (fit.summary()) |
Output:
Step 4: Conduct the Breusch-Pagan test.
The next step is to conduct the Breusch-Pagan test in order to determine whether heteroscedasticity is present.
Python3
# Importing libraries import numpy as np import pandas as pd import statsmodels.formula.api as smf from statsmodels.compat import lzip import statsmodels.stats.api as sms # Creating a dataset dataframe = pd.DataFrame({ 'rating' : [ 92 , 84 , 87 , 82 , 98 , 94 , 75 , 80 , 83 , 89 ], 'points' : [ 27 , 30 , 15 , 26 , 27 , 20 , 16 , 18 , 19 , 20 ], 'runs' : [ 5000 , 7000 , 5102 , 8019 , 1200 , 7210 , 6200 , 9214 , 4012 , 3102 ], 'wickets' : [ 110 , 120 , 110 , 80 , 90 , 119 , 116 , 100 , 90 , 76 ]}) # Fit the regression model fit = smf.ols( 'rating ~ points+runs+wickets' , data = dataframe).fit() # Conduct the Breusch-Pagan test names = [ 'Lagrange multiplier statistic' , 'p-value' , 'f-value' , 'f p-value' ] # Get the test result test_result = sms.het_breuschpagan(fit.resid, fit.model.exog) lzip(names, test_result) |
Output:
Output Interpretation:
Here, the Lagrange multiplier statistic for the test comes out to be equal to 4.364 and the corresponding p-value comes out to be equal to 0.224. Since the p-value is greater than 0.05 so we couldn’t reject the null hypothesis. Hence, We do not have enough proof to say that heteroscedasticity is present in the regression model.
How to fix Heteroscedasticity:
In the above example, heteroscedasticity was absent in the regression model. But for the case when heteroscedasticity actually exists then there are three ways to fix this:
- Transform the dependent variable: We can alter the dependent variable using some technique. For example, we can take the log of the dependent variable.
- Redefine the dependent variable: We can redefine the dependent variable. For example, using a rate for the dependent variable than the flawed value.
- Use weighted regression: The last way is to use weighted regression. In this type of regression, the weight is assigned to each data point on the basis of the variance of its fitted value. Using proper weights can eliminate the problem of heteroscedasticity.