Dunn’s test should be used to establish which groups are distinct If the Kruskal-Wallis test yields statistically significant findings. After your ANOVA has revealed a noticeable difference in three or more means, you may apply Dunn’s Test to determine which particular means are different from the rest. Dunn’s Multiple Comparison Test is a non-parametric post hoc, non-parametric test that doesn’t presume your data comes from a certain distribution.
To perform the duns test user neesneedsds to call the posthoc_dunn() function from the scikit-posthocs library.
posthoc_dunn() Function:
Syntax:
scikit_posthocs.posthoc_dunn(a, val_col: str = None, group_col: str = None, p_adjust: str = None, sort: bool = True)
Parameters:
- a : it’s an array type object or a dataframe object or series.
- group_col : column of the predictor or the dependent variable
- p_adjust: P values can be adjusted using this method. it’s a string type possible values are :
- ‘bonferroni’
- hommel
- holm-sidak
- holm
- simes-hochberg and more…
Returns: p-values.
Syntax to install posthocs library:
pip install scikit-posthocs
This is a hypotheses test and the two hypotheses are as follows:
- Null hypothesis: The given sample have the same median
- Alternative hypothesis: The given sample has a different median.
In this example, we import the packages, read the iris CSV file, and use posthoc_dunn() function to perform dunns test. dunn’s test is performed on the sepal width of the three plant species.
Click here to view and download the CSV file.
Python3
# importing packages and modules import pandas as pd import scikit_posthocs as sp # reading CSV file dataset = pd.read_csv( 'iris.csv' ) # data which contains sepal width of the three species data = [dataset[dataset[ 'species' ] = = "setosa" ][ 'sepal_width' ], dataset[dataset[ 'species' ] = = "versicolor" ][ 'sepal_width' ], dataset[dataset[ 'species' ] = = "virginica" ][ 'sepal_width' ]] # using the posthoc_dunn() function p_values = sp.posthoc_dunn(data, p_adjust = 'holm' ) print (p_values) |
Output:
- For the difference between groups 1 and 2, the adjusted p-value is 3.247311e-14
- For the difference between groups 2 and 3, the adjusted p-value is 1.521219e-02
We further check if p_values are higher than the level of significance. false represents that two groups are statistically significant or that the null hypothesis is rejected.
Python3
p_values > 0.05 |
Output:
We take the level of significance to be 0.05 in this example. no two groups (species) are statistically significant as no two groups have a p_value more than 0.05. hence, we can say the null hypothesis is false, and the alternative hypothesis is true.