-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLab2-2022-Andrii-Arsenii-Vasyl.Rmd
524 lines (396 loc) · 20.9 KB
/
Lab2-2022-Andrii-Arsenii-Vasyl.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
---
title: 'P&S-2022: Lab assignment 2'
author: "Andrii Yaroshevych, Arsenii Kazymyr, Vasyl Burak"
output:
html_document:
df_print: paged
---
## General comments and instructions
- Complete solution will give you $\bf 4$ points (out of 100 total). Submission deadline is **23:59 of 06 November 2022**\
- The report must be prepared as an *R notebook*; you must submit to **cms** both the source *R notebook* **and** the generated html file\
- At the beginning of the notebook, provide a work-breakdown structure estimating efforts of each team member\
- For each task, include
- problem formulation and discussion (what is a reasonable answer to discuss);\
- the corresponding $\mathbf{R}$ code with comments (usually it is just a couple of lines long);\
- the statistics obtained (like sample mean or anything else you use to complete the task) as well as histograms etc. to illustrate your findings;
- justification of your solution (e.g. refer to the corresponding theorems from probability theory);\
- conclusions (e.g. how reliable your answer is, does it agree with common sense expectations etc.)
- The **team id number** referred to in tasks is the **two-digit** ordinal number of your team on the list. Include the line **set.seed(team id number)** at the beginning of your code to make your calculations reproducible. Also observe that the answers **do** depend on this number!\
- Take into account that not complying with these instructions may result in point deduction regardless of whether your implementation is correct.
### Task 1
#### In this task, we discuss the \([7,4]\) Hamming code and investigate its reliability. That coding system can correct single errors in the transmission of \(4\)-bit messages and proceeds as follows:
* given a message \(\mathbf{m} = (a_1 a_2 a_3 a_4)\), we first encode it to a \(7\)-bit _codeword_ \(\mathbf{c} = \mathbf{m}G = (x_1 x_2 x_3 x_4 x_5 x_6 x_7)\), where \(G\) is a \(4\times 7\) _generator_ matrix
* the codeword \(\mathbf{c}\) is transmitted, and \(\mathbf{r}\) is the received message
* \(\mathbf{r}\) is checked for errors by calculating the _syndrome vector_ \(\mathbf{z} := \mathbf{r} H\), for a \(7 \times 3\) _parity-check_ matrix \(H\)
* if a single error has occurred in \(\mathbf{r}\), then the binary \(\mathbf{z} = (z_1 z_2 z_3)\) identifies the wrong bit no. \(z_1 + 2 z_2 + 4z_3\); thus \( (0 0 0)\) shows there was no error (or more than one), while \((1 1 0 )\) means the third bit (or more than one) got corrupted
* if the error was identified, then we flip the corresponding bit in \(\mathbf{r}\) to get the corrected \(\mathbf{r}^* = (r_1 r_2 r_3 r_4 r_5 r_6 r_7)\);
* the decoded message is then \(\mathbf{m}^*:= (r_3r_5r_6r_7)\).
#### The __generator__ matrix \(G\) and the __parity-check__ matrix \(H\) are given by
\[
G :=
\begin{pmatrix}
1 & 1 & 1 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & 1 & 1 & 0 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 0 \\
1 & 1 & 0 & 1 & 0 & 0 & 1 \\
\end{pmatrix},
\qquad
H^\top := \begin{pmatrix}
1 & 0 & 1 & 0 & 1 & 0 & 1 \\
0 & 1 & 1 & 0 & 0 & 1 & 1 \\
0 & 0 & 0 & 1 & 1 & 1 & 1
\end{pmatrix}
\]
#### Assume that each bit in the transmission \(\mathbf{c} \mapsto \mathbf{r}\) gets corrupted independently of the others with probability \(p = \mathtt{id}/100\), where \(\mathtt{id}\) is your team number. Your task is the following one.
1. Simulate the encoding-transmission-decoding process \(N\) times and find the estimate \(\hat p\) of the probability \(p^*\) of correct transmission of a single message \(\mathbf{m}\). Comment why, for large \(N\), \(\hat p\) is expected to be close to \(p^*\).
2. By estimating the standard deviation of the corresponding indicator of success by the standard error of your sample and using the CLT, predict the \emph{confidence} interval \((p^*-\varepsilon, p^* + \varepsilon)\), in which the estimate \(\hat p\) falls with probability at least \(0.95\).
3. What choice of \(N\) guarantees that \(\varepsilon \le 0.03\)?
4. Draw the histogram of the number \(k = 0,1,2,3,4\) of errors while transmitting a \(4\)-digit binary message. Do you think it is one of the known distributions?
#### You can (but do not have to) use the chunks we prepared for you
#### First, we set the **id** of the team and define the probability \(p\) and the generator and parity-check matrices \(G\) and \(H\)
```{r}
id <- 7
set.seed(id)
p <- id/100
G <- matrix(c(1, 1, 1, 0, 0, 0, 0,
1, 0, 0, 1, 1, 0, 0,
0, 1, 0, 1, 0, 1, 0,
1, 1, 0, 1, 0, 0, 1), nrow = 4, byrow = TRUE)
H <- t(matrix(c(1, 0, 1, 0, 1, 0, 1,
0, 1, 1, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1), nrow = 3, byrow = TRUE))
```
### 1.1 Simulate the encoding-transmission-decoding process \(N\) times and find the estimate \(\hat p\) of the probability \(p^*\) of correct transmission of a single message \(\mathbf{m}\). Comment why, for large \(N\), \(\hat p\) is expected to be close to \(p^*\).
#### First, we define the message generator function
```{r}
message_generator <- function(N) {
matrix(sample(c(0,1), 4*N, replace = TRUE), nrow = N)
}
```
#### Now we want to generate an errors mask for the message. The mask is a matrix of 0s and 1s, where 1s correspond to the bits that are corrupted. The probability of corruption is \(p\).
We also define a function that applies the mask to the messages and returns the corrupted messages.
```{r}
erros_generator <- function (N, p) {
matrix(sample(c(0,1), 7*N, replace = TRUE, prob = c(1-p, p)), nrow = N)
}
apply_errors <- function (codewords, errors) {
(codewords + errors) %% 2
}
```
The next steps include detecting the errors in the received messages, correcting them, and then decoding the obtained messages. After this, you can continue with calculating all the quantities of interest
```{r}
correct_messages <- function (recieved_messages, sample_size, H) {
syndromes <- (recieved_messages %*% H) %% 2
corrections_vector <- syndromes %*% c(1, 2, 4)
corrected_messages <- recieved_messages
for (i in 1:sample_size) {
invalid_bit <- corrections_vector[i]
if (invalid_bit != 0) {
corrected_messages[i, invalid_bit] <- (corrected_messages[i, invalid_bit] + 1) %% 2
}
}
return(corrected_messages)
}
decode_messages <- function (corrected_messages) {
return(corrected_messages[, c(3, 5, 6, 7)])
}
calculate_error_rate <- function (corrected_messages, messages, sample_size) {
number_of_errors <- 0
decoded_messages <- decode_messages(corrected_messages)
for (i in 1:sample_size) {
if (sum(decoded_messages[i, ] != messages[i, ]) != 0) {
number_of_errors <- number_of_errors + 1
}
}
return(number_of_errors/sample_size)
}
```
For further calculations, we need to generate the sampling distribution of the error rate. To do this, we run $iterations$ simulations of $sample\_size$.
```{r}
generate_sampling_distribution <- function (iterations, sample_size, p) {
sampling_distirbution <- c()
for (i in 1:iterations) {
messages <- message_generator(sample_size)
codewords <- (messages %*% G) %% 2
errors <- erros_generator(sample_size, p)
recieved_messages <- apply_errors(codewords, errors)
corrected_messages <- correct_messages(recieved_messages, sample_size, H)
sampling_distirbution[i] <- calculate_error_rate(corrected_messages, messages, sample_size)
}
View(sampling_distirbution)
return(sampling_distirbution)
}
sampling_distribution <- generate_sampling_distribution(1000, 100, p)
View(sampling_distribution)
```
### 1.2. By estimating the standard deviation of the corresponding indicator of success by the standard error of your sample and using the CLT, predict the ${confidence}$ interval $(p^*-\varepsilon, p^* + \varepsilon)$, in which the estimate $\hat p$ falls with probability at least $0.95$.
To estimate the standard deviation of the corresponding indicator of success, we can use the standard error of our sample. The standard error of the sample is defined as $\sqrt{\frac{\sigma^2}{n}}$, where $\sigma^2$ is the variance of the sampling distribution and $n$ is the sample size. The variance of the sampling distribution is defined as $\frac{1}{n}\sum_{i=1}^n(x_i - \bar{x})^2$, where $x_i$ is the $i$-th element of the sampling distribution and $\bar{x}$ is the mean of the sampling distribution.
```{r}
calculate_confidence_interval <- function (sampling_distribution, confidence) {
mean <- mean(sampling_distribution)
standard_error <- sd(sampling_distribution)/sqrt(length(sampling_distribution))
z <- qnorm((1 + confidence)/2)
epsilon <- z*standard_error
return(c(mean - epsilon, mean + epsilon))
}
calculate_confidence_interval(sampling_distribution, 0.95)
```
### 1.3 What choice of \(N\) guarantees that \(\varepsilon \le 0.03\)?
To find the value of $N$ that guarantees that $\varepsilon \le 0.03$, we can use the following formula:
$$
\frac{1}{\sqrt{N}} \le \frac{0.03}{\sqrt{p^* (1 - p^*)}}
$$
$$
N \ge \frac{p^* (1 - p^*)}{0.03^2}
$$
Where $p^*$ is the probability of correct transmission of a single message. Since $p^*$ is defined as mean of the sampling distribution, we can now find the value of $N$:
```{r}
evaluate_N <- function (p, epsilon) {
return (ceiling(p*(1-p)/(epsilon^2)))
}
evaluate_N(mean(sampling_distribution), 0.03)
```
4. Draw the histogram of the number \(k = 0,1,2,3,4\) of errors while transmitting a \(4\)-digit binary message. Do you think it is one of the known distributions?
```{r}
generate_histogram <- function (sample_size, p) {
messages <- message_generator(sample_size)
codewords <- (messages %*% G) %% 2
errors <- erros_generator(sample_size, p)
recieved_messages <- apply_errors(codewords, errors)
corrected_messages <- correct_messages(recieved_messages, sample_size, H)
decoded_messages <- decode_messages(corrected_messages)
number_of_errors <- c()
for (i in 1:sample_size) {
number_of_errors[i] <- sum(decoded_messages[i, ] != messages[i, ])
}
hist(number_of_errors, breaks = 4, xlim =c(0, 4), main = "Number of errors",
xlab = "Number of errors", ylab = "Frequency")
}
generate_histogram(evaluate_N(mean(sampling_distribution), 0.03), p)
```
#### Conclusion
To sum up, we have implemented the Hamming code and calculated the sampling distribution of the error rate. CLT is a powerful tool that allows us to estimate the standard deviation of the sampling distribution and predict the confidence interval. We have also found the value of $N$ that guarantees that $\varepsilon \le 0.03$. Finally, we have drawn the histogram of the number of errors while transmitting a 4-digit binary message. The histogram is not one of the known distributions, but it is close to the Poisson distribution.
### Task 2.
#### In this task, we discuss a real-life process that is well modelled by a Poisson distribution. As you remember, a Poisson random variable describes occurrences of rare events, i.e., counts the number of successes in a large number of independent random experiments. One of the typical examples is the **radioactive decay** process.
#### Consider a sample of radioactive element of mass $m$, which has a big *half-life period* $T$; it is vitally important to know the probability that during a one-second period, the number of nuclei decays will not exceed some critical level $k$. This probability can easily be estimated using the fact that, given the *activity* ${\lambda}$ of the element (i.e., the probability that exactly one nucleus decays in one second) and the number $N$ of atoms in the sample, the random number of decays within a second is well modelled by Poisson distribution with parameter $\mu:=N\lambda$. Next, for the sample of mass $m$, the number of atoms is $N = \frac{m}{M} N_A$, where $N_A = 6 \times 10^{23}$ is the Avogadro constant, and $M$ is the molar (atomic) mass of the element. The activity of the element, $\lambda$, is $\log(2)/T$, where $T$ is measured in seconds.
#### Assume that a medical laboratory receives $n$ samples of radioactive element ${{}^{137}}\mathtt{Cs}$ (used in radiotherapy) with half-life period $T = 30.1$ years and mass $m = \mathtt{team\, id \,number} \times 10^{-6}$ g each. Denote by $X_1,X_2,\dots,X_n$ the **i.i.d. r.v.**'s counting the number of decays in sample $i$ in one second.
1. Specify the parameter of the Poisson distribution of $X_i$ (you'll need the atomic mass of *Cesium-137*)\
2. Show that the distribution of the sample means of $X_1,\dots,X_n$ gets very close to a normal one as $n$ becomes large and identify that normal distribution. To this end,
- simulate the realization $x_1,x_2,\dots,x_n$ of the $X_i$ and calculate the sample mean $s=\overline{\mathbf{x}}$;
- repeat this $K$ times to get the sample $\mathbf{s}=(s_1,\dots,s_K)$ of means and form the empirical cumulative distribution function $\hat F_{\mathbf{s}}$ of $\mathbf{s}$;
- identify $\mu$ and $\sigma^2$ such that the \textbf{c.d.f.} $F$ of $\mathscr{N}(\mu,\sigma^2)$ is close to the \textbf{e.c.d.f.} $\hat F_{\mathbf{s}}$ and plot both **c.d.f.**'s on one graph to visualize their proximity (use the proper scales!);
- calculate the maximal difference between the two \textbf{c.d.f.}'s;
- consider cases $n = 5$, $n = 10$, $n=50$ and comment on the results.\
3. Calculate the largest possible value of $n$, for which the total number of decays in one second is less than $8 \times 10^8$ with probability at least $0.95$. To this end,
- obtain the theoretical bound on $n$ using Markov inequality, Chernoff bound and Central Limit Theorem, and compare the results;\
- simulate the realization $x_1,x_2,\dots,x_n$ of the $X_i$ and calculate the sum $s=x_1 + \cdots +x_n$;
- repeat this $K$ times to get the sample $\mathbf{s}=(s_1,\dots,s_K)$ of sums;
- calculate the number of elements of the sample which are less than critical value ($8 \times 10^8$) and calculate the empirical probability; comment whether it is close to the desired level $0.95$
```{r}
lambda <- log(2)/(30.1*365*24*3600)
N <- 7/137*6*(10^17)
mu <- N * lambda
K <- 1e3
n <- 5
sample_means <- colMeans(matrix(rpois(n*K, lambda = mu), nrow=n))
```
#### Next, calculate the parameters of the standard normal approximation
```{r}
sigma <- (mu/n)^0.5
```
#### We can now plot ecdf and cdf
```{r}
xlims <- c(mu-3*sigma,mu+3*sigma)
Fs <- ecdf(sample_means)
plot(Fs,
xlim = xlims,
ylim = c(0,1),
col = "blue",
lwd = 2,
main = "Comparison of ecdf and cdf")
curve(pnorm(x, mean = mu, sd = sigma), col = "red", lwd = 2, add = TRUE)
```
#### Maximal difference between the two cdf\`s
```{r}
max(abs(ecdf(sample_means)(xlims)-pnorm(xlims, mean = mean(sample_means), sd = sd(sample_means))))
```
#### Cases when n=5, n = 10, n = 50
```{r}
plot_n <- function(n){
K <- 1e3
sample_means <- colMeans(matrix(rpois(n*K, lambda = mu), nrow=n))
mu <- mu
sigma <- (mu/n)^0.5
xlims <- c(mu-3*sigma,mu+3*sigma)
Fs <- ecdf(sample_means)
plot(Fs,
xlim = xlims,
ylim = c(0,1),
col = "blue",
lwd = 2,
main = "Comparison of ecdf and cdf")
curve(pnorm(x, mean = mu, sd = sigma), col = "red", lwd = 2, add = TRUE)
max_difference<-max(abs(ecdf(sample_means)(xlims)-pnorm(xlims, mean = mean(sample_means), sd = sd(sample_means))))
print(max_difference)
}
plot_n(10)
plot_n(50)
```
#### Theoretical bounds for N
Markov bound:
```{r}
lambda <-log(2)/(30.1*365*24*3600)
N <-7/137*6*(10^17)
mu <- N * lambda
bound <- 8*10^8
prob <- 0.95
N_max <- (1-prob)*bound/mu
print(N_max)
```
Chernoff Bound:
```{r}
s<- -1
N_max <- exp(-s*bound)*exp(mu*(s-1))
print(N_max)
```
CLT
```{r}
N_max <- 1
while (pnorm((bound*N-bound)/sqrt(bound)) <= 1-prob){
N_max <- N_max + 1
}
print(N_max)
```
Prove n
```{r}
k <- 1e3
n <- 1
summ=sum(sample_means)
sum_k_times <- colSums(matrix(rexp(k*n, rate = mu), nrow = n))
mean(sum_k_times > bound)
```
### Task 3.
#### In this task, we use the Central Limit Theorem approximation for continuous random variables.
#### One of the devices to measure radioactivity level at a given location is the Geiger counter. When the radioactive level is almost constant, the time between two consecutive clicks of the Geiger counter is an exponentially distributed random variable with parameter $\nu_1 = \mathtt{team\,id\,number} + 10$. Denote by $X_k$ the random time between the $(k-1)^{\mathrm{st}}$ and $k^{\mathrm{th}}$ click of the counter.
1. Show that the distribution of the sample means of $X_1, X_2,\dots,X_n$ gets very close to a normal one (which one?) as $n$ becomes large. To this end,
- simulate the realizations $x_1,x_2,\dots,x_n$ of the \textbf{r.v.} $X_i$ and calculate the sample mean $s=\overline{\mathbf{x}}$;\
- repeat this $K$ times to get the sample $\mathbf{s}=(s_1,\dots,s_K)$ of means and then the \emph{empirical cumulative distribution} function $F_{\mathbf{s}}$ of $\mathbf{s}$;\
- identify $\mu$ and $\sigma^2$ such that the \textbf{c.d.f.} of $\mathscr{N}(\mu,\sigma^2)$ is close to the \textbf{e.c.d.f.} $F_{\mathbf{s}}$ of and plot both \textbf{c.d.f.}'s on one graph to visualize their proximity;\
- calculate the maximal difference between the two \textbf{c.d.f.}'s;\
- consider cases $n = 5$, $n = 10$, $n=50$ and comment on the results.
2. The place can be considered safe when the number of clicks in one minute does not exceed $100$. It is known that the parameter $\nu$ of the resulting exponential distribution is proportional to the number $N$ of the radioactive samples, i.e., $\nu = \nu_1*N$, where $\nu_1$ is the parameter for one sample. Determine the maximal number of radioactive samples that can be stored in that place so that, with probability $0.95$, the place is identified as safe. To do this,
- express the event of interest in terms of the \textbf{r.v.} $S:= X_1 + \cdots + X_{100}$;\
- obtain the theoretical bounds on $N$ using the Markov inequality, Chernoff bound and Central Limit Theorem and compare the results;\
- with the predicted $N$ and thus $\nu$, simulate the realization $x_1,x_2,\dots,x_{100}$ of the $X_i$ and of the sum $S = X_1 + \cdots + X_{100}$;\
- repeat this $K$ times to get the sample $\mathbf{s}=(s_1,\dots,s_K)$ of total times until the $100^{\mathrm{th}}$ click;\
- estimate the probability that the location is identified as safe and compare to the desired level $0.95$
#### First, generate samples an sample means:
```{r}
nu1 <- 17 # change this!
K <- 1e3
n <- 5
sample_means <- colMeans(matrix(rexp(n*K, rate = nu1), nrow=n))
```
#### Next, calculate the parameters of the standard normal approximation
```{r}
# mu <- mean(sample_means) # change this!
# sigma <- sd(sample_means) # change this!
mu <- 1 / nu1
sigma <- mu / sqrt(n)
```
#### We can now plot ecdf and cdf
```{r}
xlims <- c(mu-3*sigma,mu+3*sigma)
Fs <- ecdf(sample_means)
plot(Fs,
xlim = xlims,
col = "blue",
lwd = 2,
main = "Comparison of ecdf and cdf")
curve(pnorm(x, mean = mu, sd = sigma), col = "red", lwd = 2, add = TRUE)
```
#### Maximal difference between the two cdf\`s
```{r}
max(abs(ecdf(sample_means)(xlims)-pnorm(xlims, mean = mean(sample_means), sd = sd(sample_means))))
```
#### Cases when n=5, n = 10, n = 50
```{r}
plot_n <- function(n){
nu1 <- 17 # change this!
K <- 1e3
sample_means <- colMeans(matrix(rexp(n*K, rate = nu1), nrow=n))
# mu <- mean(sample_means)
# sigma <- sd(sample_means)
mu <- 1 / nu1
sigma <- mu / sqrt(n)
xlims <- c(mu-3*sigma,mu+3*sigma)
Fs <- ecdf(sample_means)
plot(Fs,
xlim = xlims,
col = "blue",
lwd = 2,
main = "Comparison of ecdf and cdf")
curve(pnorm(x, mean = mu, sd = sigma), col = "red", lwd = 2, add = TRUE)
max_difference <- max(abs(ecdf(sample_means)(xlims)-pnorm(xlims, mean = mean(sample_means), sd = sd(sample_means))))
print(max_difference)
}
plot_n(10)
plot_n(50)
```
#### Task 3.2
#### Express event in term $S:= X_1 + \cdots + X_{100}$
```{r}
S <- sum(sample_means[1:100])
print(S)
```
#### Theoretical bounds for N
Markov bound:
By definition $P (X\ge 60) \le \frac{1}{60} E(S)$
$P (X\ge 60) \ge 0.95$
$E(S) = E(x_1)+\cdots+E(x_{100})=100E(X_i)$
$E(X_i)=\frac{1}{N\nu1}$
$N=\frac{100}{\nu1*0.95}$
```{r}
n <- 100
prob <- 0.95
N <- n/(nu1*prob)
print(N)
```
Chebyshev's bound:
We do it by analogy with the previous point with Chebyshev`s inequality
```{r}
N <- (n*prob + sqrt(n))/(prob*nu1)
print(N)
```
CLT
We count all N for which the inequality holds $\Phi(t)\le0.05$
$t=\frac{\nu_1N - 100}{10}$
```{r}
N <- 0
while (pnorm((nu1*N-n)/sqrt(n)) <= 0.05){
N <- N + 1
}
N <- N - 1
print(N)
```
#### Simulation with N and new nu
```{r}
nu = N * nu1
sample_means <- matrix(rexp(n, rate = nu), nrow=n)
S <- sum(sample_means)
S
```
#### Repeat experiment K times
```{r}
k <- 1e4
sum_k_times <- colSums(matrix(rexp(k*100, rate = nu), nrow = 100))
# sum_k_times
```
#### Estimate the probability
```{r}
mean(sum_k_times >= 1)
```
**Next, proceed with all the remaining steps**
**Do not forget to include several sentences summarizing your work and the conclusions you have made!**
### General summary and conclusions
Summarize here what you've done, whether you solved the tasks, what difficulties you had etc