-
Notifications
You must be signed in to change notification settings - Fork 15
Mann Whitney U test
This is an implementation of the Wilcoxon Rank-Sum test, also known as Mann-Whitney U test. It is a non-parametric test used to know if is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample.
Although this test has been implemented under the WilcoxonRankSumTest.new
, this can be instanciated using the alias MannWhitneyU.new
.
This method receives the elements that needs to be ranked. It returns a hash with unique key-value pairs, where the key is the element and the value is another hash, that contains how many ties there are present and the ranking. It ranks in ascending order.
In the following example { 1 => { counter: 1, rank: 1 }, 2 => { counter: 2, rank: 3 } }
, there is one tie with the element 2
, so each 2
element has a ranking of 3/2
.
pry(main)> test = StatisticalTest::WilcoxonRankSumTest.new
=> #<Statistics::StatisticalTest::WilcoxonRankSumTest:0x00000001556698>
pry(main)> test.rank([1,1,2,3,4,5,6,7,7,8])
=> {1=>{:counter=>2, :rank=>3},
2=>{:counter=>1, :rank=>3},
3=>{:counter=>1, :rank=>4},
4=>{:counter=>1, :rank=>5},
5=>{:counter=>1, :rank=>6},
6=>{:counter=>1, :rank=>7},
7=>{:counter=>2, :rank=>17},
8=>{:counter=>1, :rank=>10}}
This method performs the Mann-Whitney U test. It expects the alpha value, the tail and the two groups to be compared.
The tail param can be :one_tail
or :two_tail
to specify if the method should perform a one or two tailored test.
Keep in mind that this method performs a test using the Z statistic for a normal distribution, generated from the U statistic. The normal distribution used has a mean of o and a standard deviation of 1.
It returns a hash with the following keys:
-
probability
: it calculates the probability of the z statistic, using the Standard Normal CDF. -
u
: The U statistic. -
z
: The Z statistic, which is just a transformation of the U statistic. -
p_value
: It returns the p value, calculated as1 - probability
. -
alpha
: the specified alpha value. -
null
: Eithertrue
orfalse
. If true, it means that the null hypothesis should not be rejected. -
alternative
: Eithertrue
orfalse
. If true, it means that the null hypothesis can be rejected. -
confidence_level
: Defined as1 - alpha
.
Keep in mind that the null
and alternative
keys cannot be true
at the same time.
pry(main)> group_one
=> [0.12819915256260872, 0.24345459073897613, 0.27517650565714014, 0.8522185144081152, 0.05471111219486524]
pry(main)> group_two
=> [0.3272414061985621, 0.2989306116723194, 0.642664937717922]
# Alpha of 0.05 and one tailored test
pry(main)> test.perform(alpha = 0.05, :one_tail, group_one, group_two)
=> {:probability=>0.9101437525605001, :u=>3.0, :z=>-1.3416407864998738, :p_value=>0.08985624743949994, :alpha=>0.05, :null=>true, :alternative=>false, :confidence_level=>0.95}
# Alpha of 0.01 and one tailored test
pry(main)> test.perform(alpha = 0.01, :one_tail, group_one, group_two)
=> {:probability=>0.9101437525605001, :u=>3.0, :z=>-1.3416407864998738, :p_value=>0.08985624743949994, :alpha=>0.01, :null=>true, :alternative=>false, :confidence_level=>0.99}
# Alpha of 0.01 and two tailored test
pry(main)> test.perform(alpha = 0.01, :two_tail, group_one, group_two)
=> {:probability=>0.9101437525605001, :u=>3.0, :z=>-1.3416407864998738, :p_value=>0.17971249487899987, :alpha=>0.01, :null=>true, :alternative=>false, :confidence_level=>0.99}