-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy path12-intro-to-random-numbers.Rmd
330 lines (239 loc) · 9.01 KB
/
12-intro-to-random-numbers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
---
title: "Random Numbers and Simulations"
subtitle: "Stat 133"
author: "Gaston Sanchez"
output: github_document
fontsize: 11pt
urlcolor: blue
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, error = TRUE, fig.path = '12-images/')
```
> ### Learning Objectives
>
> - How to use R to simulate chance processes
> - Getting to know the function `sample()`
> - Simulate flipping a coin
> - Visualize relative frequencies
------
## Introduction
Random numbers have many applications in science and computer programming,
especially when there are significant uncertainties in a phenomenon of interest.
In this tutorial we'll look at a basic problem that involves working
with random numbers and creating simulations.
More specifically, let's see how to use R to simulate basic chance processes
like tossing a coin.
## Let's flip a coin
Chance processes, also referred to as chance experiments, have to do with
actions in which the resulting outcome turns out to be different in each
occurrence.
Typical examples of basic chance processes are tossing one or more coins,
rolling one or more dice, selecting one or more cards from a deck of cards,
and in general, things that can be framed in terms of drawing tickets out of
a box (or any other type of container: bag, urn, etc.).
You can use your computer, and R in particular, to simulate chances processes.
In order to do that, the first step consists of learning how to create a
virtual coin, or die, or box-with-tickets.
### Creating a coin
The simplest way to create a coin with two sides, `"heads"` and `"tails"`, is
with an R character vector via the _combine_ function `c()`
```{r}
# a (virtual) coin object
coin <- c("heads", "tails")
```
You can also create a _numeric_ coin that shows `1` and `0` instead of
`"heads"` and `"tails"`:
```{r}
num_coin <- c(0, 1)
```
Likewise, you can also create a _logical_ coin that shows `TRUE` and `FALSE`
instead of `"heads"` and `"tails"`:
```{r}
log_coin <- c(TRUE, FALSE)
```
## Tossing a coin
Once you have an object that represents the _coin_, the next step involves
learning how to simulate tossing the coin. One way to simulate the action of
tossing a coin in R is with the function `sample()` which lets you draw
random samples, with or without replacement, from an input vector.
To toss the coin use `sample()` like this:
```{r}
coin <- c('heads', 'tails')
# toss the coin
sample(coin, size = 1)
```
with the argument `size = `, specifying that we want to take a sample of size 1
from the input vector `coin`.
### Function `sample.int()`
Another function related to `sample()` is `sample.int()` which simulates
drawing random integers. The main argument is `n`, which represents the maximum
integer to sample from: `1, 2, 3, ..., n`
```{r}
sample.int(10)
```
### Random Samples
By default, `sample()` draws each element in `coin` with the same probability.
In other words, each element is assigned the same probability of being chosen.
Another default behavior of `sample()` is to take a sample of the specified
`size` __without replacement__. If `size = 1`, it does not really matter whether
the sample is done with or without replacement.
To draw two elements WITHOUT replacement, use `sample()` like this:
```{r}
# draw 2 elements without replacement
sample(coin, size = 2)
```
What if we try to toss the coin three or four times?
```{r}
# trying to toss coin 3 times
sample(coin, size = 3)
```
Notice that R produced an error message. This is because the default behavior
of `sample()` cannot draw more elements that the length of the input vector.
To be able to draw more elements, we need to sample WITH replacement, which is
done by specifying the argument `replace = TRUE`, like this:
```{r}
# draw 4 elements with replacement
sample(coin, size = 4, replace = TRUE)
```
## The Random Seed
The way `sample()` works is by taking a random sample from the input vector.
This means that every time you invoke `sample()` you will likely get a different
output.
In order to make the examples replicable (so you can get the same output as me),
you need to specify what is called a __random seed__. This is done with the
function `set.seed()`. By setting a _seed_, every time you use one of the random
generator functions, like `sample()`, you will get the same values.
```{r}
# set random seed
set.seed(1257)
# toss a coin with replacement
sample(coin, size = 4, replace = TRUE)
```
All computations of random numbers are based on deterministic algorithms, so
the sequence of numbers is not truly random. However, the sequence of numbers
appears to lack any systematic pattern, and we can therefore regard the
numbers as random.
Every time you use one of the random generator functions in R, the call
produces different numbers. For replication and debugging purposes, it is
useful to get the same sequence of random numebrs every time we run the script.
This functionality is obtained by setting a __seed__ before we start generating
the numebrs. The seed is an integer and set by the function `set.seed()`
```{r}
set.seed(123)
runif(4)
```
If we set the seed to `123` again, the sequence of uniform random numbers is
regenerated:
```{r}
set.seed(123)
runif(4)
```
If we don't specify a seed, the random generator functions set a seed based
on the current time. That is, the seed will be different each time we run the
script and consequently the sequence of random numbers will also be different.
## Sampling with different probabilities
Last but not least, `sample()` comes with the argument `prob` which allows you
to provide specific probabilities for each element in the input vector.
By default, `prob = NULL`, which means that every element has the same
probability of being drawn. In the example of tossing a coin, the command
`sample(coin)` is equivalent to `sample(coin, prob = c(0.5, 0.5))`. In the
latter case we explicitly specify a probability of 50% chance of heads, and
50% chance of tails:
```{r echo = FALSE}
# tossing a fair coin
coin <- c("heads", "tails")
sample(coin)
sample(coin, prob = c(0.5, 0.5))
```
However, you can provide different probabilities for each of the elements in
the input vector. For instance, to simulate a __loaded__ coin with chance of
heads 20%, and chance of tails 80%, set `prob = c(0.2, 0.8)` like so:
```{r}
# tossing a loaded coin (20% heads, 80% tails)
sample(coin, size = 5, replace = TRUE, prob = c(0.2, 0.8))
```
-----
## Simulating tossing a coin
Now that we have all the elements to toss a coin with R, let's simulate flipping
a coin 100 times, and use the function `table()` to count the resulting number
of `"heads"` and `"tails"`:
```{r}
# number of flips
num_flips <- 100
# flips simulation
coin <- c('heads', 'tails')
flips <- sample(coin, size = num_flips, replace = TRUE)
# number of heads and tails
freqs <- table(flips)
freqs
```
In my case, I got `r freqs[1]` heads and `r freqs[2]` tails. Your results will
probably be different than mine. Some of you will get more `"heads"`, some of
you will get more `"tails"`, and some will get exactly 50 `"heads"` and 50
`"tails"`.
Run another series of 100 flips, and find the frequency of `"heads"` and `"tails"`:
```{r}
# one more 100 flips
flips <- sample(coin, size = num_flips, replace = TRUE)
freqs <- table(flips)
freqs
```
## Tossing function
Let's make things a little bit more complex but also more interesting.
Instead of calling `sample()` every time we want to toss a coin, we can
write a `toss()` function:
```{r}
#' @title coin toss function
#' @description simulates tossing a coin a given number of times
#' @param x coin object (a vector)
#' @param times number of tosses
#' @return vector of tosses
toss <- function(x, times = 1) {
sample(x, size = times, replace = TRUE)
}
# basic call
toss(coin)
# toss 5 times
toss(coin, 5)
```
We can make the function more versatile by adding a `prob` argument that let
us specify different probabilities for `heads` and `tails`
```{r}
#' @title coin toss function
#' @description simulates tossing a coin a given number of times
#' @param x coin object (a vector)
#' @param times number of tosses
#' @param prob vector of probabilities for each side of the coin
#' @return vector of tosses
toss <- function(x, times = 1, prob = NULL) {
sample(x, size = times, replace = TRUE, prob = prob)
}
# toss a loaded coin 10 times
toss(coin, times = 10, prob = c(0.8, 0.2))
```
## Counting Frequencies
The next step is to toss a coin several times, and count the frequency of
`heads` and `tails`
```{r}
# count frequencies
tosses <- toss(coin, times = 100)
table(tosses)
```
We can also count the relative frequencies:
```{r}
# relative freqs (proportions)
table(tosses) / length(tosses)
```
To make things more interesting, let's consider how the frequency of `heads`
evolves over a series of `n` tosses.
```{r}
n <- 500
tosses <- toss(coin, times = n)
heads_freq <- cumsum(tosses == 'heads') / 1:n
```
In this case, we can make a plot of the relative frequencies:
```{r head_freqs_plot}
plot(heads_freq, type = 'l', lwd = 2, col = 'tomato', las = 1,
ylim = c(0, 1))
abline(h = 0.5, col = 'gray50')
```