Saturday, November 16, 2024
Google search engine
HomeLanguagesKolmogorov-Smirnov Test (KS Test)

Kolmogorov-Smirnov Test (KS Test)

In this article, we will look at the non-parametric test which can be used to determine whether the shape of the two distributions is the same or not.

What is Kolmogorov-Smirnov Test?

Kolmogorov–Smirnov Test is a completely efficient manner to determine if two samples are significantly one of a kind from each different. It is normally used to check the uniformity of random numbers. Uniformity is one of the maximum important properties of any random number generator and the Kolmogorov–Smirnov check can be used to check it. The Kolmogorov–Smirnov take a look at can also be used to check whether or not two underlying one-dimensional opportunity distributions differ. It is a totally green manner to determine if two samples are substantially distinct from each other. The Kolmogorov–Smirnov statistic quantifies the gap between the empirical distribution function of the pattern and the cumulative distribution feature of the reference distribution, or among the empirical distribution functions of  samples.

How Kolmogorov-Smirnov test works?

To answer this first we need to discuss the purpose to use this test. The main idea behind using this test is to check whether the two samples that we are dealing with follow the same type of distribution or if the shape of the distribution is the same or not.

First of all, if we assume that the shape or the probability distribution of the two samples is the same then the maximum value of the absolute difference between the cumulative probability distribution difference between the two functions will be the same. And higher the value the difference between the shape of the distribution is high. 

To check the shape of the sample of data we generally used hypothesis testing which is of two types:

  • Parametric Test
  • Non – Parametric Test

Null Hypothesis of Kolmogorov-Smirnov Test

H0(Null Hypothesis): Null hypothesis assumes that the two samples of the data at hand are from the same distribution.

As KS Test is a non – parametric method there is no restriction that the samples should be from the normal distribution for which we use the chi-square distribution.

-> Rank the N random numbers in ascending order.
-> Calculate D+ as max(i/N-Ri) for all i in(1, N)
-> Calculate D- as max(Ri-((i-1)/N)) for all i in(1, N)
-> Calculate D as max(sqrt(N) * D+, sqrt(N) * D-)
-> If D>D(alpha) 
    Rejects Uniformity
   else
    It fails to reject the Null Hypothesis.

Below is the Python implementation of the above algorithm : 

Python3




import numpy as np
 
# Rank the N random numbers
N = 30
# F(X) can be any continuous distribution,
# here I am using normal distribution
f_x = np.random.normal(size=N)
f_x_sorted = np.sort(f_x)
 
# Calculate max(i/N-Ri)
plus_max = list()
for i in range(1, N + 1):
    x = i / N - f_x_sorted[i-1]
    plus_max.append(x)
K_plus_max = np.sqrt(N) * np.max(plus_max)
 
 
# Calculate max(Ri-((i-1)/N))
minus_max = list()
for i in range(1, N + 1):
    y = (i-1)/N
    y = f_x_sorted[i-1]-y
    minus_max.append(y)
K_minus_max = np.sqrt(N) * np.max(minus_max)
 
# Calculate KS Statistic
K_max = max(K_plus_max, K_minus_max)


Output:

11.691053208016287

What is the cause to apply the Kolmogorov – Smirnov Test?

The Kolmogorov-Smirnov (K-S) check is a statistical test used to determine whether or not  records units have a look at the same underlying possibility distribution or if they are significantly precise from each different in terms in their distributional properties. It is a non-parametric check, because of this it does now not make assumptions about the shape or parameters of the distributions being in comparison, making it a flexible take a look at that can be used with a extensive style of information sorts and distributions.

The essential features of the use of the Kolmogorov-Smirnov test are:

1. Goodness-of-in shape attempting out: The K-S check can be used to evaluate how nicely a pattern data set fits a hypothesized distribution.       This may be beneficial in determining whether or now not a sample of facts is probable to have been drawn from a particular distribution,           together with a ordinary distribution or an exponential distribution. This is frequently used in fields together with finance, engineering, and           herbal sciences to verify whether a records set conforms to an predicted distribution, which could have implications for preference-making,       version fitting, and prediction.

2. Two-sample comparison: The K-S test can be used to evaluate two facts units to decide whether or not they’re drawn from the same                     underlying distribution. This may be beneficial in assessing whether there are statistically giant differences among  statistics units, together         with comparing the overall performance of  tremendous companies in an test or evaluating the distributions of two precise variables. It is             normally utilized in fields together with social sciences, remedy, and agency to evaluate whether or not there are full-size variations among       groups or populations.

3. Hypothesis sorting out: The K-S check can be used to check unique hypotheses about the distributional residences of a records set. For                 instance, it is able to be used to check whether a facts set is normally distributed or whether or not it follows a specific theoretical distribution.     This may be beneficial in verifying assumptions made in statistical analyses or validating version assumptions.

4. Non-parametric alternative: The K-S test is a non-parametric test, because of this it does no longer require assumptions about the form or         parameters of the underlying distributions being in contrast. This makes it a beneficial opportunity to parametric checks, in conjunction with       the t-test or ANOVA, at the same time as facts do no longer meet the assumptions of these assessments, along with at the same time as               statistics are not generally disbursed, have unknown or unequal variances, or have small pattern sizes.

Limitations of the Kolmogorov-Smirnov Test

1. Sensitivity to sample length: K-S check may additionally moreover have confined energy with small sample sizes and may yield statistically         sizeable results with large sample sizes even for small versions.

2. Assumes independence: K-S test assumes that the records gadgets being compared are unbiased, and might not be appropriate for based       facts.

3. Limited to non-stop records: K-S take a look at is designed for non-stop statistics and won’t be suitable for discrete or specific information           without modifications.

4 Lack of sensitivity to precise distributional properties: K-S test assesses fashionable differences among distributions and might not be touchy    to variations specially distributional houses.

5. Vulnerability to type I error with multiple comparisons: Multiple K-S exams or use of K-S test in a larger hypothesis checking out framework         might also boom the threat of type I mistakes.

6. Interpretation challenges: Interpreting K-S take a look at consequences can be hard because the check statistic does no longer provide               information approximately the direction or significance of variations between distributions.

RELATED ARTICLES

Most Popular

Recent Comments