Prerequisites: Parametric and Non-Parametric Methods, Hypothesis Testing
In this article, we will be discussing the different approaches to perform Grubbs’ Test in Python programming language.
Grubbs’ Test is also known as the maximum normalized residual test or extreme studentized deviate test is a test used to detect outliers in a univariate data set assumed to come from a normally distributed population. This test is defined for the hypothesis:
- Ho: There are no outliers in the data set
- Ha: There is exactly one oiler in the database
Method 1: Performing two-side Grubbs’ Test
In this method to perform the grubb’s test, the user needs to call the smirnov_grubbs.test() function from the outlier_utils package passed with the required data passed as the parameters.
Syntax: smirnov_grubbs.test(data, alpha)
Parameters:
- data: A numeric vector of data values
- alpha: The significance level to use for the test.
Example:
In this example, we are performing the two-sided Grubbs test, which will detect outliers on both ends of the dataset using the smirnov_grubbs.test() function in the python programming language.
Python
import numpy as np from outliers import smirnov_grubbs as grubbs # define data data = np.array([ 20 , 21 , 26 , 24 , 29 , 22 , 21 , 50 , 28 , 27 ]) # perform Grubbs' test grubbs.test(data, alpha = . 05 ) |
Output:
array([20, 21, 26, 24, 29, 22, 21, 28, 27])
Method 2: Performing one-side Grubbs’ Test
In this approach to get the one-side grubb’s test, the user needs to call either grubbs.min_test() function to get the min. the outlier of the given data set or the grubbs.max_test() to get the max. outlier out from the given data set.
Syntax:
grubbs.min_test(data, alpha)
grubbs.max_test(data, alpha)
Example 1:
Under this example, we will be performing a one-side Grubbs’ Test using the grubbs.min_test() function of the given data in the python programming language.
Python
import numpy as np from outliers import smirnov_grubbs as grubbs # define data data = np.array([ 20 , 21 , 26 , 24 , 29 , 22 , 21 , 50 , 28 , 27 , 5 ]) print ( "Data after performing min one-side grubb's test: " ) # perform min Grubbs' test grubbs.min_test(data, alpha = . 05 ) |
Output:
Data after performing min one-side grubb's test: array([20, 21, 26, 24, 29, 22, 21, 50, 28, 27, 5])
Example 2:
Under this example, we will be performing a one-side Grubbs’ Test using the grubbs.max_test() function of the given data in the python programming language.
Python
import numpy as np from outliers import smirnov_grubbs as grubbs # define data data = np.array([ 20 , 21 , 26 , 24 , 29 , 22 , 21 , 50 , 28 , 27 , 5 ]) print ( "Data after performing min one-side grubb's test: " ) # perform max Grubbs' test grubbs.max_test(data, alpha = . 05 ) |
Output:
Data after performing min one-side grubb's test: array([20, 21, 26, 24, 29, 22, 21, 28, 27, 5])
Method 3: Extract the Index of the Outlier using the gribb’s test
In this approach, the user needs to follow the below syntax to get the index at which the outlier is present of the given data.
grubbs.max_test_indices() function: This function returns the index of the outlier present in the array.
Syntax: grubbs.max_test_indices(data,alpha)
Python
import numpy as np from outliers import smirnov_grubbs as grubbs # define data data = np.array([ 20 , 21 , 26 , 24 , 29 , 22 , 21 , 50 , 28 , 27 , 5 ]) grubbs.max_test_indices(data, alpha = . 05 ) |
Output:
[7]
Method 4: Extract the value of the Outlier using the grubb’s test
In this approach, the user needs to follow the below syntax to get the value at which the outlier is present of the given data.
grubbs.max_test_outlines() function: This function returns the value of the outlier present in the array.
grubbs.max_test_outlines(data,alpfa)
Python
import numpy as np from outliers import smirnov_grubbs as grubbs # define data data = np.array([ 20 , 21 , 26 , 24 , 29 , 22 , 21 , 50 , 28 , 27 , 5 ]) grubbs.max_test_outliers(data, alpha = . 05 ) |
Output:
[50]