diff --git a/session-inference/figures/hypotestest.jpg b/session-inference/figures/hypotestest.jpg new file mode 100644 index 0000000..86adf6d Binary files /dev/null and b/session-inference/figures/hypotestest.jpg differ diff --git a/session-inference/lectures/inferenceI.html b/session-inference/lectures/inferenceI.html index d4b68c8..d069c37 100644 --- a/session-inference/lectures/inferenceI.html +++ b/session-inference/lectures/inferenceI.html @@ -1450,10 +1450,19 @@
Statistical inference is to draw conclusions regarding properties of a population based on observations of a random sample from the population.
+A hypothesis test is a type of inference about evaluating if a hypothesis about a population is supported by the observations of a random sample (i.e by the data available).
+Typically, the hypotheses that are tested are assumptions about properties of a population, such as proportion, mean, mean difference, variance etc.
+You suspect that a dice is loaded, i.e. showing ‘six’ more often than expected of a fair dice. To test this you throw the dice 10 times and count the total number of sixes. You got 5 sixes. Is there reason to believe that the dice is loaded?
-Live coding!
4. Null distribution
+If high-fat diet has no effect, i.e. if \(H_0\) was true, the result would be as if all mice were given the same diet.
+The 24 mice were initially from the same population, depending on how the mice are randomly assigned to high-fat and normal group, the mean weights would differ, even if the two groups were treated the same.
+4. Null distribution
+Random reassignment to two groups can be accomplished using permutation.
Assume \(H_0\) is true, i.e. assume all mice are equivalent and
If we repeat 1-2 many times we get the sampling distribution when \(H_0\) is true, the so called null distribution, of difference in mean weights.
4. Null distribution
5. Compute p-value
What is the probability to get an at least as extreme mean difference as our observed value, \(d_{obs}\), if \(H_0\) was true?
\(P(\bar X_2 - \bar X_2 \geq d_{obs} | H_0) =\) 0.169
6. Conclusion?
[1] "T"
+[1] "H"
Every time you run sample
a new coin toss is simulated.
[1] "H" "T" "H" "H" "T" "T" "T" "H" "T" "T" "H" "H" "T" "T" "H" "T" "H" "H" "H"
+ [1] "H" "T" "H" "H" "H" "T" "H" "H" "T" "H" "H" "T" "T" "T" "H" "T" "H" "T" "H"
[20] "H"
[1] 11
+[1] 12
We can repeat this experiment (toss 20 coins and count the number of heads) several times to estimate the distribution of number of heads in 20 coin tosses.
@@ -470,11 +470,11 @@From this we conclude that
-\(P(Y \geq 15) =\) 202/10000 = 0.0202
+\(P(Y \geq 15) =\) 217/10000 = 0.0217
Resampling can also be used to compute other properties of a random variable, such as the expected value.
The law of large numbers states that if the same experiment is performed many times the average of the result will be close to the expected value.
diff --git a/session-probability/docs/prob_exr1_discrv_solutions.html b/session-probability/docs/prob_exr1_discrv_solutions.html index dd24b6f..f9e3835 100644 --- a/session-probability/docs/prob_exr1_discrv_solutions.html +++ b/session-probability/docs/prob_exr1_discrv_solutions.html @@ -460,7 +460,7 @@ [1] "c" "c" "c" "T" "c" "c" "c" "T" "T" "c" "c" "T" "T" "T" "T" "c" "c" "T" "T"
-[20] "c"
+ [1] "c" "c" "T" "c" "c" "c" "c" "T" "c" "T" "c" "T" "c" "T" "T" "c" "c" "T" "T"
+[20] "T"
[1] 0.014
+[1] 0.015
Ntreat
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
- 1 1 10 47 160 374 740 1263 1646 1758 1553 1129 748 362 139 61
- 17 18
- 5 3
+ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
+ 2 5 60 169 340 728 1192 1585 1774 1567 1280 706 392 145 39 13
+ 18 19
+ 2 1
[1] 0.021
+[1] 0.02
[1] 210
+[1] 200
[1] 0.00021
+[1] 2e-04
N
0 1 2 3 4 5 6 7 8
-16071 32380 29282 15305 5440 1268 222 31 1
+16192 32060 29226 15619 5330 1339 195 37 2
N
0 1 2 3 4 5 6 7 8
-0.16071 0.32380 0.29282 0.15305 0.05440 0.01268 0.00222 0.00031 0.00001
+0.16192 0.32060 0.29226 0.15619 0.05330 0.01339 0.00195 0.00037 0.00002
[1] 1522
+[1] 1573
[1] 0.015
+[1] 0.016
[1] 0.015
+[1] 0.016
x
0 1 2 3
-34 41 22 3
+35 43 20 2
[1] 0.34
+[1] 0.35
## Solution using 1000 replicates
x <- replicate(1000, sum(sample(c(0,0,0,0,0,0,0,1,1,1), size=3, replace=TRUE)))
@@ -1135,7 +1137,7 @@ Simulation
x
0 1 2 3
-350 445 182 23
+353 451 172 24
@@ -1147,7 +1149,7 @@ Simulation
x
0 1 2 3
-34287 44208 18824 2681
+34348 43969 19016 2667
@@ -1179,7 +1181,7 @@ Simulation
x
0 1 2 3
-31982 47783 18459 1776
+31865 48035 18355 1745
@@ -1211,7 +1213,7 @@ Simulation
x
0 1 2 3
-34055 44465 18747 2733
+34233 44412 18737 2618
@@ -1322,7 +1324,7 @@
x
1 2 3
-30207 59757 10036
+29885 60064 10051
[1] 0.0 0.3 0.9 1.0
@@ -1332,6 +1334,48 @@
+
Exercise 11 (Rare disease) A rare disease affects 3 in 100000 in a large population. If 10000 people are randomly selected from the population, what is the probability
+
+- that no one in the sample is affected?
+- that at least two in the sample are affected?
+
+
+