-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathJHS Statistical Inference Final Project part 2.Rmd
105 lines (66 loc) · 3.18 KB
/
JHS Statistical Inference Final Project part 2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
title: "JHS statistical inference final project Part 2"
author: "Yuncheng Yang"
date: "13 February 2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Part 2 Basic Inferential Data Analysis
This part we are going to apply basic exploratory data analysis and use statistical inference
### Loading and obtaining a basic idea of data
```{r, echo=TRUE}
data("ToothGrowth")
str(ToothGrowth)
```
The data is about 2 types of supplement OJ and VC has different effecet under dose of 0.5, 1.0, 2.0
Now we use plots to explore more details
```{r, echo=TRUE}
library(ggplot2)
g3<-ggplot(data = ToothGrowth, aes(x=dose, y=len, color=supp))+geom_point()
g3
```
we can see both supplements appeared to be more effective with increase of dose
Now we explore the trends
```{r, echo=TRUE}
g3+geom_smooth(method = "lm")
```
The effectiveness of VC seems more correlated to dose, now use statistical inference to explore the difference
### Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose
To prove the assumption that the effectiveness of VC seems more correlated to dose, we need a series of data analysis
#### Analysis between supps
```{r, echo=TRUE}
lenoj<-subset(ToothGrowth, supp=='OJ', select = 'len')
lenvc<-subset(ToothGrowth, supp=='VC', select = 'len')
t.test(lenoj, lenvc)
```
p value is 0.06 >0.05, hence two supp does not have sigificant difference on improving tooth growth overall
#### Analysis between different dose
Because of the uphill trend we observed, we gonna compare dose at 2.0 and dose at 0.5
```{r, echo=TRUE}
len0.5<-subset(ToothGrowth, dose==0.5, select = 'len')
len2.0<-subset(ToothGrowth, dose==2.0, select = 'len')
t.test(len2.0$len, len0.5$len,paired = T)
```
p value is small, hence dose at 2.0 is significantly more effective then dose at 0.5
#### Analysis between dose with in supp group
First we are going to analyse the difference of dose within each supp
We are going to determine if there is a siginificant difference between higher dose 2.0 and lower dose 0.5
```{r, echo=TRUE}
###statistical test for the supp oj
len2.0oj<-ToothGrowth[ToothGrowth$dose==2.0&ToothGrowth$supp=='OJ',1]
len0.5oj<-ToothGrowth[ToothGrowth$dose==0.5&ToothGrowth$supp=='OJ',1]
t.test(len2.0oj,len0.5oj, paired = T)
```
The mean difference is significant with small p value, hence higher dose of OJ supp improve teeth growth
Now we apply the similar procedure on VC group
```{r, echo=TRUE}
###statistical test for the supp vc
len2.0vc<-ToothGrowth[ToothGrowth$dose==2.0&ToothGrowth$supp=='VC',1]
len0.5vc<-ToothGrowth[ToothGrowth$dose==0.5&ToothGrowth$supp=='VC',1]
t.test(len2.0vc,len0.5vc, paired = T)
```
The mean of difference is 18.16, and p value is very small, hence we can conclude higher dose of VC improves tooth growth
## Conclusion
Two supplements both showed correlation between tooth growth and dose of supp, with no overall significant difference between 2 supp. However, the difference between higher and lower dose of VC appeared to be larger than the difference between higher and lower dose of OJ