-
Notifications
You must be signed in to change notification settings - Fork 15
Kolmogorov Smirnov Test (KS Test)
Esteban Zapata Rojas edited this page Jan 8, 2019
·
1 revision
This is an implementation of the Kolmogorov-Smirnov goodness of fit test (KS test) for two samples.
This test give us some validations to know if two samples follows the same distribution or not, by measuring the distance between them. It does not tell which particular distribution fits.
Although this test has been implemented with the KolmogorovSmirnovTest.new
, this also can be used with the alias KSTest.new
.
This method expects three keywords: group_one:
, group_two:
and alpha:
, where alpha:
has a default value of 0.05
. It returns a hash with the following keys:
-
d_max
: It returns the maximum absolute difference between all samples. -
d_critical
: It returns D the critical value, calculated as specified here: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test. -
total_samples
: It returns the total number of samples evaluated. -
alpha
: the specified alpha value. -
null
: Eithertrue
orfalse
. If true, it means that the null hypothesis should not be rejected. -
alternative
: Eithertrue
orfalse
. If true, it means that the null hypothesis can be rejected. -
confidence_level
: Defined as1 - alpha
.
An example with two samples coming from the same distribution:
6] pry(main)> group_one = Distribution::StandardNormal.new.random(elements: 10, seed: 5)
=> [-0.33087015189408764,
-0.2520921296030769,
1.5824811170615634,
-0.5916366579302884,
-0.32986995777935924,
-0.2048765105875873,
0.6034716026094954,
-0.7001790376899514,
1.8573310072313118,
0.6448475108927784]
[7] pry(main)> group_two = Distribution::StandardNormal.new.random(elements: 20, seed: 4)
=> [0.499951333237829,
0.6935985082913116,
-1.5845772351121241,
0.5985751739673772,
-1.1474766329454797,
-0.08798692834027545,
0.33225314537233536,
0.3509971530825316,
1.5469793290157272,
0.046135567230164806,
0.054432738865157676,
-1.2089481591289206,
0.39429521471439166,
-1.1128121538537732,
-1.3609655918364936,
0.5424513084033299,
-2.358073633011557,
0.8378363539201328,
0.9148409578006828,
0.7965118987249309]
[8] pry(main)> StatisticalTest::KSTest.two_samples(group_one: group_one, group_two: group_two)
=> {:d_max=>0.3, :d_critical=>0.4740041355479394, :total_samples=>30, :alpha=>0.05, :null=>true, :alternative=>false, :confidence_level=>0.95}
An example with two samples, both from different distributions:
[10] pry(main)> group_one = Distribution::StandardNormal.new.random(elements: 10, seed: 5)
=> [-0.33087015189408764,
-0.2520921296030769,
1.5824811170615634,
-0.5916366579302884,
-0.32986995777935924,
-0.2048765105875873,
0.6034716026094954,
-0.7001790376899514,
1.8573310072313118,
0.6448475108927784]
[11] pry(main)> group_two = Distribution::Weibull.new(3, 4).random(elements: 10, seed: 5)
=> [0.39738923502790524,
0.7997225442592585,
0.3868528026254838,
0.8559574581334326,
0.5513010739589683,
0.6184303884163658,
0.7133558756258811,
0.5673993393806528,
0.4448443026776265,
0.37319828077807826]
[12] pry(main)> StatisticalTest::KSTest.two_samples(group_one: group_one, group_two: group_two)
=> {:d_max=>0.6, :d_critical=>0.5473328305111973, :total_samples=>20, :alpha=>0.05, :null=>false, :alternative=>true, :confidence_level=>0.95}