In this article, we will look at the non-parametric test which can be used to determine whether the shape of the two distributions is the same or not.
What is Kolmogorov-Smirnov Test?
Kolmogorov–Smirnov Test is a completely efficient manner to determine if two samples are significantly one of a kind from each different. It is normally used to check the uniformity of random numbers. Uniformity is one of the maximum important properties of any random number generator and the Kolmogorov–Smirnov check can be used to check it. The Kolmogorov–Smirnov take a look at can also be used to check whether or not two underlying one-dimensional opportunity distributions differ. It is a totally green manner to determine if two samples are substantially distinct from each other. The Kolmogorov–Smirnov statistic quantifies the gap between the empirical distribution function of the pattern and the cumulative distribution feature of the reference distribution, or among the empirical distribution functions of samples.
How Kolmogorov-Smirnov test works?
To answer this first we need to discuss the purpose to use this test. The main idea behind using this test is to check whether the two samples that we are dealing with follow the same type of distribution or if the shape of the distribution is the same or not.
First of all, if we assume that the shape or the probability distribution of the two samples is the same then the maximum value of the absolute difference between the cumulative probability distribution difference between the two functions will be the same. And higher the value the difference between the shape of the distribution is high.
To check the shape of the sample of data we generally used hypothesis testing which is of two types:
- Parametric Test
- Non – Parametric Test
Null Hypothesis of Kolmogorov-Smirnov Test
H0(Null Hypothesis): Null hypothesis assumes that the two samples of the data at hand are from the same distribution.
As KS Test is a non – parametric method there is no restriction that the samples should be from the normal distribution for which we use the chi-square distribution.
-> Rank the N random numbers in ascending order. -> Calculate D+ as max(i/N-Ri) for all i in(1, N) -> Calculate D- as max(Ri-((i-1)/N)) for all i in(1, N) -> Calculate D as max(sqrt(N) * D+, sqrt(N) * D-) -> If D>D(alpha) Rejects Uniformity else It fails to reject the Null Hypothesis.
Below is the Python implementation of the above algorithm :
Python3
import numpy as np # Rank the N random numbers N = 30 # F(X) can be any continuous distribution, # here I am using normal distribution f_x = np.random.normal(size = N) f_x_sorted = np.sort(f_x) # Calculate max(i/N-Ri) plus_max = list () for i in range ( 1 , N + 1 ): x = i / N - f_x_sorted[i - 1 ] plus_max.append(x) K_plus_max = np.sqrt(N) * np. max (plus_max) # Calculate max(Ri-((i-1)/N)) minus_max = list () for i in range ( 1 , N + 1 ): y = (i - 1 ) / N y = f_x_sorted[i - 1 ] - y minus_max.append(y) K_minus_max = np.sqrt(N) * np. max (minus_max) # Calculate KS Statistic K_max = max (K_plus_max, K_minus_max) |
Output:
11.691053208016287
What is the cause to apply the Kolmogorov – Smirnov Test?
The Kolmogorov-Smirnov (K-S) check is a statistical test used to determine whether or not records units have a look at the same underlying possibility distribution or if they are significantly precise from each different in terms in their distributional properties. It is a non-parametric check, because of this it does now not make assumptions about the shape or parameters of the distributions being in comparison, making it a flexible take a look at that can be used with a extensive style of information sorts and distributions.
The essential features of the use of the Kolmogorov-Smirnov test are:
1. Goodness-of-in shape attempting out: The K-S check can be used to evaluate how nicely a pattern data set fits a hypothesized distribution. This may be beneficial in determining whether or now not a sample of facts is probable to have been drawn from a particular distribution, together with a ordinary distribution or an exponential distribution. This is frequently used in fields together with finance, engineering, and herbal sciences to verify whether a records set conforms to an predicted distribution, which could have implications for preference-making, version fitting, and prediction.
2. Two-sample comparison: The K-S test can be used to evaluate two facts units to decide whether or not they’re drawn from the same underlying distribution. This may be beneficial in assessing whether there are statistically giant differences among statistics units, together with comparing the overall performance of tremendous companies in an test or evaluating the distributions of two precise variables. It is normally utilized in fields together with social sciences, remedy, and agency to evaluate whether or not there are full-size variations among groups or populations.
3. Hypothesis sorting out: The K-S check can be used to check unique hypotheses about the distributional residences of a records set. For instance, it is able to be used to check whether a facts set is normally distributed or whether or not it follows a specific theoretical distribution. This may be beneficial in verifying assumptions made in statistical analyses or validating version assumptions.
4. Non-parametric alternative: The K-S test is a non-parametric test, because of this it does no longer require assumptions about the form or parameters of the underlying distributions being in contrast. This makes it a beneficial opportunity to parametric checks, in conjunction with the t-test or ANOVA, at the same time as facts do no longer meet the assumptions of these assessments, along with at the same time as statistics are not generally disbursed, have unknown or unequal variances, or have small pattern sizes.
Limitations of the Kolmogorov-Smirnov Test
1. Sensitivity to sample length: K-S check may additionally moreover have confined energy with small sample sizes and may yield statistically sizeable results with large sample sizes even for small versions.
2. Assumes independence: K-S test assumes that the records gadgets being compared are unbiased, and might not be appropriate for based facts.
3. Limited to non-stop records: K-S take a look at is designed for non-stop statistics and won’t be suitable for discrete or specific information without modifications.
4 Lack of sensitivity to precise distributional properties: K-S test assesses fashionable differences among distributions and might not be touchy to variations specially distributional houses.
5. Vulnerability to type I error with multiple comparisons: Multiple K-S exams or use of K-S test in a larger hypothesis checking out framework might also boom the threat of type I mistakes.
6. Interpretation challenges: Interpreting K-S take a look at consequences can be hard because the check statistic does no longer provide information approximately the direction or significance of variations between distributions.