-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* statistical-test: Chi-Squared Goodness of fit. (#11) * statistical-test: Paired T-tests. (#14) * Wilcoxon Rank-sum test / Mann-Whitney U test (#15) * Generate gaussian random numbers (#16) * distribution: normal: Generate random gaussian samples. * distribution: Weibull: Random sample that follows Weibull distribution. * distribution: Student's T: Generate random samples for T distribution. NOTE: The random sampling process is generation random values that are similar to an uniform random sample. Not sure why. * version 2.0.0! 🎉
- Loading branch information
1 parent
4a9effa
commit 33d75d8
Showing
18 changed files
with
619 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,6 @@ language: ruby | |
rvm: | ||
- 2.2 | ||
- 2.3.1 | ||
- 2.4.0 | ||
- 2.5.0 | ||
before_install: gem install bundler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ def mode | |
end | ||
|
||
def mean | ||
return if alpha + beta == 0 | ||
alpha / (alpha + beta) | ||
end | ||
end | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
module Statistics | ||
module StatisticalTest | ||
class ChiSquaredTest | ||
def self.chi_statistic(expected, observed) | ||
# If the expected is a number, we asumme that all expected observations | ||
# has the same probability to occur, hence we expect to see the same number | ||
# of expected observations per each observed value | ||
statistic = if expected.is_a? Numeric | ||
observed.reduce(0) do |memo, observed_value| | ||
up = (observed_value - expected) ** 2 | ||
memo += (up/expected.to_f) | ||
end | ||
else | ||
expected.each_with_index.reduce(0) do |memo, (expected_value, index)| | ||
up = (observed[index] - expected_value) ** 2 | ||
memo += (up/expected_value.to_f) | ||
end | ||
end | ||
|
||
[statistic, observed.size - 1] | ||
end | ||
|
||
def self.goodness_of_fit(alpha, expected, observed) | ||
chi_score, df = *self.chi_statistic(expected, observed) # Splat array result | ||
|
||
return if chi_score.nil? || df.nil? | ||
|
||
probability = Distribution::ChiSquared.new(df).cumulative_function(chi_score) | ||
p_value = 1 - probability | ||
|
||
# According to https://stats.stackexchange.com/questions/29158/do-you-reject-the-null-hypothesis-when-p-alpha-or-p-leq-alpha | ||
# We can assume that if p_value <= alpha, we can safely reject the null hypothesis, ie. accept the alternative hypothesis. | ||
{ probability: probability, | ||
p_value: p_value, | ||
alpha: alpha, | ||
null: alpha < p_value, | ||
alternative: p_value <= alpha, | ||
confidence_level: 1 - alpha } | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
module Statistics | ||
module StatisticalTest | ||
class WilcoxonRankSumTest | ||
def rank(elements) | ||
ranked_elements = {} | ||
|
||
elements.sort.each_with_index do |element, index| | ||
if ranked_elements.fetch(element, false) | ||
# This allow us to solve the ties easily when performing the rank summation per group | ||
ranked_elements[element][:counter] += 1 | ||
ranked_elements[element][:rank] += (index + 1) | ||
else | ||
ranked_elements[element] = { counter: 1, rank: (index + 1) } | ||
end | ||
end | ||
|
||
# ranked_elements = [{ x => { counter: 1, rank: y } ] | ||
ranked_elements | ||
end | ||
|
||
# Steps to perform the calculation are based on http://www.mit.edu/~6.s085/notes/lecture5.pdf | ||
def perform(alpha, tails, group_one, group_two) | ||
# Size for each group | ||
n1, n2 = group_one.size, group_two.size | ||
|
||
# Rank all data | ||
total_ranks = rank(group_one + group_two) | ||
|
||
# sum rankings per group | ||
r1 = ranked_sum_for(total_ranks, group_one) | ||
r2 = ranked_sum_for(total_ranks, group_two) | ||
|
||
# calculate U statistic | ||
u1 = (n1 * (n1 + 1)/2.0) - r1 | ||
u2 = (n2 * (n2 + 1)/2.0 ) - r2 | ||
|
||
u_statistic = [u1.abs, u2.abs].min | ||
|
||
median_u = (n1 * n2)/2.0 | ||
|
||
ties = total_ranks.values.select { |element| element[:counter] > 1 } | ||
|
||
std_u = if ties.size > 0 | ||
corrected_sigma(ties, n1, n2) | ||
else | ||
Math.sqrt((n1 * n2 * (n1 + n2 + 1))/12.0) | ||
end | ||
|
||
z = (u_statistic - median_u)/std_u | ||
|
||
# Most literature are not very specific about the normal distribution to be used. | ||
# We ran multiple tests with a Normal(median_u, std_u) and Normal(0, 1) and we found | ||
# the latter to be more aligned with the results. | ||
probability = Distribution::StandardNormal.new.cumulative_function(z.abs) | ||
p_value = 1 - probability | ||
p_value *= 2 if tails == :two_tail | ||
|
||
{ probability: probability, | ||
u: u_statistic, | ||
z: z, | ||
p_value: p_value, | ||
alpha: alpha, | ||
null: alpha < p_value, | ||
alternative: p_value <= alpha, | ||
confidence_level: 1 - alpha } | ||
end | ||
|
||
# Formula extracted from http://www.statstutor.ac.uk/resources/uploaded/mannwhitney.pdf | ||
private def corrected_sigma(ties, total_group_one, total_group_two) | ||
n = total_group_one + total_group_two | ||
|
||
rank_sum = ties.reduce(0) do |memo, t| | ||
memo += ((t[:counter] ** 3) - t[:counter])/12.0 | ||
end | ||
|
||
left = (total_group_one * total_group_two)/(n * (n - 1)).to_f | ||
right = (((n ** 3) - n)/12.0) - rank_sum | ||
|
||
Math.sqrt(left * right) | ||
end | ||
|
||
private def ranked_sum_for(total, group) | ||
# sum rankings per group | ||
group.reduce(0) do |memo, element| | ||
rank_of_element = total[element][:rank] / total[element][:counter].to_f | ||
memo += rank_of_element | ||
end | ||
end | ||
end | ||
|
||
# Both test are the same. To keep the selected name, we just alias the class | ||
# with the implementation. | ||
MannWhitneyU = WilcoxonRankSumTest | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
module Statistics | ||
VERSION = "1.0.2" | ||
VERSION = "2.0.0" | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.