Skip to content

Commit

Permalink
Merge pull request #249 from NBISweden/evaupdate
Browse files Browse the repository at this point in the history
Update probability session (move an exercise)
  • Loading branch information
evaf authored Apr 22, 2024
2 parents 13ebc77 + c8b1595 commit 3cea7f3
Show file tree
Hide file tree
Showing 28 changed files with 1,102 additions and 864 deletions.
Binary file added session-inference/figures/hypotestest.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
90 changes: 85 additions & 5 deletions session-inference/lectures/inferenceI.html

Large diffs are not rendered by default.

70 changes: 64 additions & 6 deletions session-inference/lectures/inferenceI.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,21 @@ options(digits=2)

## Introduction to hypothesis tests

::: {.r-fit-text}

**Statistical inference** is to draw conclusions regarding properties of a population based on observations of a random sample from the population.

:::{.columns}
:::{.column width="7%"}
![](../figures/Lampa.jpg)
:::
:::{.column width="92%"}
A **hypothesis test** is a type of inference about evaluating if a hypothesis about a population is supported by the observations of a random sample (i.e by the data available).
:::
::::

Typically, the hypotheses that are tested are assumptions about properties of a population, such as proportion, mean, mean difference, variance etc.
:::


![](../figures/hypotestest.jpg){width=50% fig-align="center"}

## The null and alternative hypothesis

Expand Down Expand Up @@ -92,8 +99,23 @@ df1 |> ggplot(aes(x=group, y=x, color=group)) + geom_boxplot() + theme_bw() + xl
::: {.smaller style="font-size: 35px" .incremental}
1. Define $H_0$ and $H_1$
2. Select an appropriate significance level, $\alpha$

::: {.notes}
The significance level is the acceptable risk of false alarm, i.e. to say *"I have a hit"*, *"I found a difference"*, when the the null hypothesis (*"there is no difference"*) is true.
:::

3. Select appropriate test statistic, $T$, and compute the observed value, $t_{obs}$

::: {.notes}
For example the difference in means, the proportion of successes, the correlation coefficient etc.
:::

4. Assume that the $H_0$ is true and compute the sampling distribution of $T$.

::: {.notes}
We will get back to the null distribution in the next slide
:::

5. Compare the observed value, $t_{obs}$, with the computed sampling distribution under $H_0$ and compute a p-value. The p-value is the probability of observing a value at least as extreme as the observed value, if $H_0$ is true.
6. Based on the p-value either accept or reject $H_0$.
:::
Expand Down Expand Up @@ -218,8 +240,6 @@ ggplot(df, aes(x=x, y=f, color=h, fill=h)) + geom_line() + geom_area(data=df %>

You suspect that a dice is loaded, i.e. showing 'six' more often than expected of a fair dice. To test this you throw the dice 10 times and count the total number of sixes. You got 5 sixes. Is there reason to believe that the dice is loaded?

*Live coding!*

1. Define $H_0$ and $H_1$
2. Select an appropriate significance level, $\alpha$
3. Select appropriate test statistic, $T$, and $t_{obs}$
Expand Down Expand Up @@ -658,7 +678,6 @@ If high-fat diet has no effect, i.e. if $H_0$ was true, the result would be as i
::: {.column width="50%"}
The 24 mice were initially from the same population, depending on how the mice are randomly assigned to high-fat and normal group, the mean weights would differ, even if the two groups were treated the same.

Random reassignment to two groups can be accomplished using permutation.
:::
::: {.column width="50%"}
```{r results="asis"}
Expand All @@ -670,6 +689,45 @@ for (i in o) {
:::
::::

## Simulation example

**4. Null distribution**

If high-fat diet has no effect, i.e. if $H_0$ was true, the result would be as if all mice were given the same diet.

::: {.columns}
::: {.column width="50%"}
The 24 mice were initially from the same population, depending on how the mice are randomly assigned to high-fat and normal group, the mean weights would differ, even if the two groups were treated the same.
:::
::: {.column width="50%"}
:::{.columns}
:::{.column width="40%"}
```{r results="asis"}
sc <- 2
o2 <- sample(o)
for (i in o2[1:12]) {
cat(sprintf('![](../figures/Mus.jpg){width=%i data-id="%s"}', round(c(xHF, xN)[i]*sc), name[i]))
}
```
:::
:::{.column width="40%"}
```{r results="asis"}
for (i in o2[13:24]) {
cat(sprintf('![](../figures/Mus.jpg){width=%i data-id="%s"}', round(c(xHF, xN)[i]*sc), name[i]))
}
```
:::
::::

:::
::::

## Simulation example

**4. Null distribution**

Random reassignment to two groups can be accomplished using permutation.

Assume $H_0$ is true, i.e. assume all mice are equivalent and

1. Randomly reassign 12 of the 24 mice to 'high-fat' and the remaining 12 to 'control'.
Expand Down
Binary file modified session-probability/Rfigures/prob_CFUc-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_dicec-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_fig-leftskewed-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_fig-meanskew-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_fig-pdfnewborn-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_fig-wtbabiesdens3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_histNheads-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_solcards-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_solcointossc-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/Rfigures/prob_solrandome-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_CFUc-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_fig-leftskewed-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_fig-meanskew-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_fig-pdfnewborn-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_fig-wtbabiesdens3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_histNheads-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified session-probability/docs/Rfigures/prob_solrandome-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 5 additions & 5 deletions session-probability/docs/prob_02discrv.html
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,7 @@ <h2 data-number="2.2" class="anchored" data-anchor-id="simulate-distributions"><
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Another coin toss</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">sample</span>(<span class="fu">c</span>(<span class="st">"H"</span>, <span class="st">"T"</span>), <span class="at">size=</span><span class="dv">1</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "T"</code></pre>
<pre><code>[1] "H"</code></pre>
</div>
</div>
<p>Every time you run <code>sample</code> a new coin toss is simulated.</p>
Expand All @@ -436,7 +436,7 @@ <h2 data-number="2.2" class="anchored" data-anchor-id="simulate-distributions"><
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="do">## 20 independent coin tosses</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>(coins <span class="ot">&lt;-</span> <span class="fu">sample</span>(<span class="fu">c</span>(<span class="st">"H"</span>, <span class="st">"T"</span>), <span class="at">size=</span><span class="dv">20</span>, <span class="at">replace=</span><span class="cn">TRUE</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "H" "T" "H" "H" "T" "T" "T" "H" "T" "T" "H" "H" "T" "T" "H" "T" "H" "H" "H"
<pre><code> [1] "H" "T" "H" "H" "H" "T" "H" "H" "T" "H" "H" "T" "T" "T" "H" "T" "H" "T" "H"
[20] "H"</code></pre>
</div>
</div>
Expand All @@ -445,7 +445,7 @@ <h2 data-number="2.2" class="anchored" data-anchor-id="simulate-distributions"><
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="do">## How many heads?</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="fu">sum</span>(coins <span class="sc">==</span> <span class="st">"H"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 11</code></pre>
<pre><code>[1] 12</code></pre>
</div>
</div>
<p>We can repeat this experiment (toss 20 coins and count the number of heads) several times to estimate the distribution of number of heads in 20 coin tosses.</p>
Expand All @@ -470,11 +470,11 @@ <h2 data-number="2.2" class="anchored" data-anchor-id="simulate-distributions"><
<div class="cell">
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">sum</span>(Nheads <span class="sc">&gt;=</span> <span class="dv">15</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 202</code></pre>
<pre><code>[1] 217</code></pre>
</div>
</div>
<p>From this we conclude that</p>
<p><span class="math inline">\(P(Y \geq 15) =\)</span> 202/10000 = 0.0202</p>
<p><span class="math inline">\(P(Y \geq 15) =\)</span> 217/10000 = 0.0217</p>
</div>
<p>Resampling can also be used to compute other properties of a random variable, such as the expected value.</p>
<p>The <strong>law of large numbers</strong> states that if the same experiment is performed many times the average of the result will be close to the expected value.</p>
Expand Down
Loading

0 comments on commit 3cea7f3

Please sign in to comment.