In this article, we will be looking at the various approaches to perform Multivariate Normality Tests in Python.
Multivariate Normality test is a test of normality, it determines whether the given group of variables comes from the normal distribution or not. Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution.
multivariate_normality() function
In this approach, the user needs to call the multivariate_normality() function with the required parameters from the pingouin library to conduct the multivariate Normality test on the given data in Python.
Syntax to install pingouin library:
pip install pingouin
Syntax: multivariate_normality(x,alpha)
Parameters:
- X: Data matrix of shape (n_samples, n_features).
- alpha: Significance level.
Returns
- hz:he Henze-Zirkler test statistic.
- pval:P-value.
- normal: True if X comes from a multivariate normal distribution.
This is a hypotheses test and the two hypotheses are as follows:
- H0 (accepted): The variables follow a multivariate normal distribution..(Po>0.05)
- Ha (rejected): The variables do not follow a multivariate normal distribution.
Example 1: Multivariate Normality test on the multivariate normal distribution in Python
In this example, we will be simply using the multivariate_normality() function from the pingouin library to Conduct a Multivariate Normality test on the randomly generated data with 100 data points with 5 variables in python.
Python3
from pingouin import multivariate_normality import pandas as pd import numpy as np data = pd.DataFrame({ 'a' : np.random.normal(size = 100 ), 'b' : np.random.normal(size = 100 ), 'c' : np.random.normal(size = 100 ), 'd' : np.random.normal(size = 100 ), 'e' : np.random.normal(size = 100 )}) # perform the Multivariate Normality Test multivariate_normality(data, alpha = . 05 ) |
Output:
HZResults(hz=0.7973450591569415, pval=0.8452549483161891, normal=True)
Output Interpretation:
Since in the above example, the p-value is 0.84 which is more than the threshold(0.5) which is the alpha(0.5) then we fail to reject the null hypothesis i.e. we do not have evidence to say that sample follows a multivariate normal distribution.
Example 2: Multivariate Normality test on not multivariate normal distribution in Python
In this example, we will be simply using the multivariate_normality() function from the pingouin library to Conduct a Multivariate Normality test on the randomly generated data passion distribution with 100 data points with 5 variables in python.
Python3
from pingouin import multivariate_normality import pandas as pd import numpy as np data = pd.DataFrame({ 'a' :np.random.poisson(size = 100 ), 'b' : np.random.poisson(size = 100 ), 'c' : np.random.poisson(size = 100 ), 'd' : np.random.poisson(size = 100 ), 'e' :np.random.poisson(size = 100 )}) # perform the Multivariate Normality Test multivariate_normality(data, alpha = . 05 ) |
HZResults(hz=7.4701896678920745, pval=0.00355552234721754, normal=False)
Output Interpretation:
Since in the above example, the p-value is 0.003 which is less than the alpha(0.5) then we reject the null hypothesis i.e. we have sufficient evidence to say that sample does not come from a multivariate normal distribution.