How to Calculate Studentized Residuals in Python?

23 July 2024

1

Studentized residual is a statistical term and it is defined as the quotient obtained by dividing a residual by its estimated standard deviation. This is a crucial technique used in the detection of outlines. Practically, one can claim that any type of observation in a dataset having a studentized residual of more than 3 (absolute value) is an outlier.

The following Python libraries should already be installed in our system:

pandas
numpy
statsmodels

You can install these packages on your system by using the below command on the terminal.

pip3 install pandas numpy statsmodels matplotlib

Steps to calculate studentized residuals in Python

Step 1: Import the libraries.

We need to import the libraries in the program that we have installed above.

Python3

# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt

Step 2: Create a data frame.

Firstly, we are required to create a data frame. With the help of the pandas’ package, we can create a data frame. The snippet is given below,

Python3

# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84, 
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30, 
                                 25, 25, 24, 29]})

Step 3: Build a simple linear regression model.

Now we need to build a simple linear regression model of the created dataset. For fitting a simple linear regression model Python provides ols() function from statsmodels package.

Syntax:

statsmodels.api.OLS(y, x)

Parameters:

y : It represents the variable that depends on x

x :It represents independent variable

Example:

Python3

# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()

Step 4: Producing studentized residual.

For producing a dataFrame that would contain the studentized residuals of each observation in the dataset we can use outlier_test() function.

Syntax:

simple_regression_model.outlier_test()

This function will produce a dataFrame that would contain the studentized residuals for each observation in the dataset

Python3

# Producing studentized residual
stud_res = simple_regression_model.outlier_test()

Below is the complete implementation.

Python3

# Python program to calculate studentized residual
 
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
 
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30, 
                                 25, 25, 24, 29]})
 
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
 
# Producing studentized residual
result = simple_regression_model.outlier_test()
 
print(result)

Output:

The output is a data frame that contains:

The studentized residual
The unadjusted p-value of the studentized residual
The Bonferroni-corrected p-value of the studentized residual

We can see that the studentized residual for the first observation in the dataset is -1.121201, the studentized residual for the second observation is 0.954871, and so on.

Visualization:

Now let us go into the visualization of the studentized residual. With the help of matplotlib we can make a plot of the predictor variable values VS the corresponding studentized residuals.

Example:

Python3

# Python program to draw the plot
# of stundenterized residual
 
# Importing necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
 
# Creating dataframe
dataframe = pd.DataFrame({'Score': [80, 95, 80, 78, 84,
                                    96, 86, 75, 97, 89],
                   'Benchmark': [27, 28, 18, 18, 29, 30, 
                                 25, 25, 24, 29]})
 
# Building simple linear regression model
simple_regression_model = ols('Score ~ Benchmark', data=dataframe).fit()
 
# Producing studentized residual
result = simple_regression_model.outlier_test()
 
# Defining predictor variable values and 
# studentized residuals
x = dataframe['Score']
y = result['student_resid']
 
# Creating a scatterplot of predictor variable 
# vs studentized residuals
plt.scatter(x, y)
plt.axhline(y=0, color='black', linestyle='--')
plt.xlabel('Points')
plt.ylabel('Studentized Residuals')
 
# Save the plot
plt.savefig("Plot.png")

Output:

Plot.png:

How to Calculate Studentized Residuals in Python?

Steps to calculate studentized residuals in Python

Python3

Python3

Python3

Python3

Python3

Visualization:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

5 Best Free Online Virus Scanners & Removers for 2025 by Kate Davidson

Recent Comments

EDITOR PICKS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR POSTS

How to Protect Against Walmart Gift Card Scams in 2025 by Manual Thomas

Interview With Dan Chernov – CEO of DerScanner by Shauli Zacks

5 Best Free Antiviruses for Linux in 2025: Expert Ranked by Sam Boyd

POPULAR CATEGORY

ABOUT US

FOLLOW US