diff --git a/session-inference/figures/hypotestest.jpg b/session-inference/figures/hypotestest.jpg new file mode 100644 index 0000000..86adf6d Binary files /dev/null and b/session-inference/figures/hypotestest.jpg differ diff --git a/session-inference/lectures/inferenceI.html b/session-inference/lectures/inferenceI.html index d4b68c8..d069c37 100644 --- a/session-inference/lectures/inferenceI.html +++ b/session-inference/lectures/inferenceI.html @@ -1450,10 +1450,19 @@

Statistical inference, part I

Introduction to hypothesis tests

-

Statistical inference is to draw conclusions regarding properties of a population based on observations of a random sample from the population.

+
+
+

+

A hypothesis test is a type of inference about evaluating if a hypothesis about a population is supported by the observations of a random sample (i.e by the data available).

+
+

Typically, the hypotheses that are tested are assumptions about properties of a population, such as proportion, mean, mean difference, variance etc.

+
+
+

+
@@ -1502,8 +1511,53 @@

To perform a hypothesis test

  1. Define \(H_0\) and \(H_1\)
  2. Select an appropriate significance level, \(\alpha\)
  3. +
+ +
  1. Select appropriate test statistic, \(T\), and compute the observed value, \(t_{obs}\)
  2. +
+ +
  1. Assume that the \(H_0\) is true and compute the sampling distribution of \(T\).
  2. +
+ +
  1. Compare the observed value, \(t_{obs}\), with the computed sampling distribution under \(H_0\) and compute a p-value. The p-value is the probability of observing a value at least as extreme as the observed value, if \(H_0\) is true.
  2. Based on the p-value either accept or reject \(H_0\).
@@ -1622,7 +1676,6 @@

Statistical power

Perform a hypothesis test

You suspect that a dice is loaded, i.e. showing ‘six’ more often than expected of a fair dice. To test this you throw the dice 10 times and count the total number of sixes. You got 5 sixes. Is there reason to believe that the dice is loaded?

-

Live coding!

  1. Define \(H_0\) and \(H_1\)
  2. Select an appropriate significance level, \(\alpha\)
  3. @@ -2464,6 +2517,33 @@

    Simulation example

    +
+
+

Simulation example

+

4. Null distribution

+

If high-fat diet has no effect, i.e. if \(H_0\) was true, the result would be as if all mice were given the same diet.

+
+
+

The 24 mice were initially from the same population, depending on how the mice are randomly assigned to high-fat and normal group, the mean weights would differ, even if the two groups were treated the same.

+
+
+
+
+

+
+
+
+

+
+
+
+
+
+
+
+

Simulation example

+

4. Null distribution

+

Random reassignment to two groups can be accomplished using permutation.

Assume \(H_0\) is true, i.e. assume all mice are equivalent and

  1. Randomly reassign 12 of the 24 mice to ‘high-fat’ and the remaining 12 to ‘control’.
  2. @@ -2471,18 +2551,18 @@

    Simulation example

If we repeat 1-2 many times we get the sampling distribution when \(H_0\) is true, the so called null distribution, of difference in mean weights.

-
+

Simulation example

4. Null distribution

-
+

Simulation example

5. Compute p-value

What is the probability to get an at least as extreme mean difference as our observed value, \(d_{obs}\), if \(H_0\) was true?

\(P(\bar X_2 - \bar X_2 \geq d_{obs} | H_0) =\) 0.169

-
+

Simulation example

6. Conclusion?

Every time you run sample a new coin toss is simulated.

@@ -436,7 +436,7 @@

<
## 20 independent coin tosses
 (coins <- sample(c("H", "T"), size=20, replace=TRUE))
-
 [1] "H" "T" "H" "H" "T" "T" "T" "H" "T" "T" "H" "H" "T" "T" "H" "T" "H" "H" "H"
+
 [1] "H" "T" "H" "H" "H" "T" "H" "H" "T" "H" "H" "T" "T" "T" "H" "T" "H" "T" "H"
 [20] "H"
@@ -445,7 +445,7 @@

<
## How many heads?
 sum(coins == "H")
-
[1] 11
+
[1] 12

We can repeat this experiment (toss 20 coins and count the number of heads) several times to estimate the distribution of number of heads in 20 coin tosses.

@@ -470,11 +470,11 @@

<
sum(Nheads >= 15)
-
[1] 202
+
[1] 217

From this we conclude that

-

\(P(Y \geq 15) =\) 202/10000 = 0.0202

+

\(P(Y \geq 15) =\) 217/10000 = 0.0217

Resampling can also be used to compute other properties of a random variable, such as the expected value.

The law of large numbers states that if the same experiment is performed many times the average of the result will be close to the expected value.

diff --git a/session-probability/docs/prob_exr1_discrv_solutions.html b/session-probability/docs/prob_exr1_discrv_solutions.html index dd24b6f..f9e3835 100644 --- a/session-probability/docs/prob_exr1_discrv_solutions.html +++ b/session-probability/docs/prob_exr1_discrv_solutions.html @@ -460,7 +460,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] "H" +[1] "T" ``` ::: @@ -471,7 +471,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` - [1] "T" "H" "T" "H" "H" "T" "H" "T" "T" "T" "H" "H" "H" "H" "H" "T" "T" "T" "H" + [1] "H" "T" "H" "H" "T" "T" "T" "T" "T" "H" "H" "T" "T" "H" "T" "T" "T" "H" "T" [20] "H" ``` ::: @@ -483,7 +483,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] 11 +[1] 8 ``` ::: @@ -548,7 +548,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] 0.061 +[1] 0.06 ``` ::: ::: @@ -576,8 +576,10 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` Nheads - 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 - 9 53 154 395 768 1202 1597 1740 1550 1220 728 375 148 52 9 + 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 + 4 12 51 159 370 788 1155 1592 1778 1596 1150 776 363 148 46 11 + 18 + 1 ``` ::: ::: @@ -614,7 +616,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] 0 +[1] 4e-04 ``` ::: @@ -624,7 +626,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] 0 +[1] 4 ``` ::: @@ -640,7 +642,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] 194 +[1] 221 ``` ::: @@ -650,7 +652,7 @@

Simulation

::: {.cell-output .cell-output-stdout} ``` -[1] 0.00019 +[1] 0.00022 ``` ::: ::: @@ -718,8 +720,8 @@

Simulation

## Randomize 20 independent patients
 (patients <- sample(c("T", "c"), size=20, replace=TRUE))
-
 [1] "c" "c" "c" "T" "c" "c" "c" "T" "T" "c" "c" "T" "T" "T" "T" "c" "c" "T" "T"
-[20] "c"
+
 [1] "c" "c" "T" "c" "c" "c" "c" "T" "c" "T" "c" "T" "c" "T" "T" "c" "c" "T" "T"
+[20] "T"
## How many patients are assigned to treatment group?
 sum(patients == "T")
@@ -739,7 +741,7 @@

Simulation

## Proportion of the 10000 repeats with exactly 15 T
 mean(Ntreat==15)
-
[1] 0.014
+
[1] 0.015
    @@ -748,7 +750,7 @@

    Simulation

    mean(Ntreat<7)
    -
    [1] 0.059
    +
    [1] 0.058
      @@ -764,10 +766,10 @@

      Simulation

      table(Ntreat)
      Ntreat
      -   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
      -   1    1   10   47  160  374  740 1263 1646 1758 1553 1129  748  362  139   61 
      -  17   18 
      -   5    3 
      + 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 + 2 5 60 169 340 728 1192 1585 1774 1567 1280 706 392 145 39 13 + 18 19 + 2 1
        @@ -778,7 +780,7 @@

        Simulation

        ## probability of 15 T or more
         mean(Ntreat>=15)
        -
        [1] 0.021
        +
        [1] 0.02
          @@ -801,11 +803,11 @@

          Simulation

          }) sum(Ntreat<=2)
          -
          [1] 210
          +
          [1] 200
          mean(Ntreat<=2)
          -
          [1] 0.00021
          +
          [1] 2e-04
          @@ -852,8 +854,8 @@

          Simulation

          ::: {.cell-output .cell-output-stdout} ``` N - 0 1 2 3 4 5 6 7 -16135 32233 29041 15590 5422 1338 215 26 + 0 1 2 3 4 5 6 7 8 +16074 32225 29145 15439 5560 1298 234 22 3 ``` ::: @@ -865,8 +867,8 @@

          Simulation

          ::: {.cell-output .cell-output-stdout} ``` N - 0 1 2 3 4 5 6 7 -0.16135 0.32233 0.29041 0.15590 0.05422 0.01338 0.00215 0.00026 + 0 1 2 3 4 5 6 7 8 +0.16074 0.32225 0.29145 0.15439 0.05560 0.01298 0.00234 0.00022 0.00003 ``` ::: @@ -886,7 +888,7 @@

          Simulation

          ::: {.cell} ::: {.cell-output .cell-output-stdout} ``` -[1] 1579 +[1] 1557 ``` ::: @@ -1002,14 +1004,14 @@

          Simulation

          N
               0     1     2     3     4     5     6     7     8 
          -16071 32380 29282 15305  5440  1268   222    31     1 
          +16192 32060 29226 15619 5330 1339 195 37 2
          ##The probability mass function
           table(N)/length(N)
          N
                 0       1       2       3       4       5       6       7       8 
          -0.16071 0.32380 0.29282 0.15305 0.05440 0.01268 0.00222 0.00031 0.00001 
          +0.16192 0.32060 0.29226 0.15619 0.05330 0.01339 0.00195 0.00037 0.00002
          hist(N, breaks=(0:11)-0.5)
          @@ -1021,13 +1023,13 @@

          Simulation

        -
        [1] 1522
        +
        [1] 1573
        -
        [1] 0.015
        +
        [1] 0.016
        -
        [1] 0.015
        +
        [1] 0.016
          @@ -1092,7 +1094,7 @@

          Simulation

          ::: {.cell-output .cell-output-stdout} ``` -[1] 5e-04 +[1] 0.00063 ``` ::: ::: @@ -1123,11 +1125,11 @@

          Simulation

          x
            0  1  2  3 
          -34 41 22  3 
          +35 43 20 2
          mean(x==0)
          -
          [1] 0.34
          +
          [1] 0.35
          ## Solution using 1000 replicates
           x <- replicate(1000, sum(sample(c(0,0,0,0,0,0,0,1,1,1), size=3, replace=TRUE)))
          @@ -1135,7 +1137,7 @@ 

          Simulation

          x
             0   1   2   3 
          -350 445 182  23 
          +353 451 172 24
          mean(x==0)
          @@ -1147,7 +1149,7 @@

          Simulation

          x
               0     1     2     3 
          -34287 44208 18824  2681 
          +34348 43969 19016 2667
          mean(x==0)
          @@ -1179,7 +1181,7 @@

          Simulation

          x
               0     1     2     3 
          -31982 47783 18459  1776 
          +31865 48035 18355 1745
          mean(x==0)
          @@ -1211,7 +1213,7 @@

          Simulation

          x
               0     1     2     3 
          -34055 44465 18747  2733 
          +34233 44412 18737 2618
          mean(x==0)
          @@ -1322,7 +1324,7 @@

          x
               1     2     3 
          -30207 59757 10036 
          +29885 60064 10051

          [1] 0.0 0.3 0.9 1.0
          @@ -1332,6 +1334,48 @@

          +

          Exercise 11 (Rare disease) A rare disease affects 3 in 100000 in a large population. If 10000 people are randomly selected from the population, what is the probability

          +
            +
          1. that no one in the sample is affected?
          2. +
          3. that at least two in the sample are affected?
          4. +
          +
          + +
          +
          +
            +
          1. +
          +
          +
          n <- 10000
          +p <- 3/100000
          +ppois(0, n*p)
          +
          +
          [1] 0.74
          +
          +
          +
            +
          1. +
          +
          +
          ppois(1, n*p, lower.tail=FALSE)
          +
          +
          [1] 0.037
          +
          +
          +
          +
          +
          +