forked from ChiLiubio/microeco_tutorial
-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-Diversity-based_class.Rmd
292 lines (215 loc) · 10.3 KB
/
04-Diversity-based_class.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
# Diversity-based class
Diversity is one of the core topics in community ecology.
It refers to alpha diversity, beta diversity and gamma diversity.
## trans_alpha class
Alpha diversity can be transformed and visualized using trans_alpha class.
Creating the object of trans_alpha class can invoke the alpha_diversity data stored in the microtable object.
### Example
Creating trans_alpha object have two return data.frame with prefix 'data_': `data_alpha` and `data_stat`.
The data_alpha is used for the following differential test and visualization.
```{r, echo = TRUE, eval = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Group")
# return t1$data_stat
t1$data_stat[1:5, ]
```
```{r, echo = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Group")
pander::pander(t1$data_stat[1:5, ])
```
Then, we test the differences among groups using Kruskal-Wallis Rank Sum Test (overall test when groups > 2), Wilcoxon Rank Sum Tests (for paired groups),
Dunn's Kruskal-Wallis Multiple Comparisons (for paired groups when groups > 2) and anova with multiple comparisons.
```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "KW")
# return t1$res_diff
t1$res_diff[1:5, ]
```
```{r, echo = FALSE}
suppressWarnings(t1$cal_diff(method = "KW"))
pander::pander(t1$res_diff[1:5, c(1:2, 4:7)])
```
```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "KW_dunn")
# return t1$res_diff
t1$res_diff[1:5, ]
```
```{r, echo = FALSE}
t1$cal_diff(method = "KW_dunn")
pander::pander(t1$res_diff[1:5, c(1, 3:8)])
```
```{r, echo = TRUE, eval = FALSE}
# more options
t1$cal_diff(method = "wilcox")
t1$cal_diff(method = "t.test")
```
Then, let's try to use anova.
```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "anova")
# return t1$res_diff
t1$res_diff
```
```{r, echo = FALSE}
t1$cal_diff(method = "anova")
pander::pander(t1$res_diff)
```
The multi-factor analysis of variance is also supported with the `formula` parameter, such as two-way anova.
```{r, echo = TRUE, eval = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Group")
t1$cal_diff(method = "anova", formula = "Group+Type")
head(t1$res_diff)
# see the help document for the usage of formula
```
The plot_alpha function add the significance label by searching the results in **object$res_diff** instead of calculating the significance again.
Now, let us plot the mean and se of alpha diversity for each group, and add the anova result.
```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "anova")
t1$plot_alpha(measure = "Chao1")
t1$plot_alpha(measure = "Chao1", order_x_mean = TRUE, add_sig_text_size = 6)
```
```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_letter.png")
```
```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "wilcox")
t1$plot_alpha(measure = "Chao1", shape = "Group")
```
```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_wilcox.png")
```
Let's try to remove the ns in the label by operating the object$res_diff file.
```{r, echo = TRUE, eval = FALSE}
t1$res_diff %<>% base::subset(Significance != "ns")
t1$plot_alpha(measure = "Chao1", xtext_size = 15)
```
```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_wilcox_nons.png")
```
From the v0.12.0, the trans_alpha class supports the differential test of groups within each group by using the by_group parameter.
```{r, echo = TRUE, eval = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Type", by_group = "Group")
t1$cal_diff(method = "wilcox")
t1$plot_alpha(measure = "Shannon")
```
```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_wilcox_bygroup.png")
```
Scheirer Ray Hare test is a nonparametric test that is suitable for a two-way factorial experiment.
```{r, echo = TRUE, eval = FALSE}
# require rcompanion package to be installed
t1$cal_diff(method = "scheirerRayHare", formula = "Group+Type")
```
### Key points
+ trans_alpha$new: creating trans_alpha object can invoke alpha_diversity in microtable for transformation
+ cal_diff: formula parameter can be used for multi-factor analysis of variance
+ plot_alpha: the significance label comes from the object$res_diff
## trans_beta class
The trans_beta class is developed for the beta diversity analysis, i.e. the dissimilarities among samples.
Beta diversity can be defined at different forms[@Tuomisto_diversity_2010] and can be explored with different ways[@Anderson_Navigating_2011].
We encapsulate some commonly-used approaches in microbial ecology[@Ramette_Multivariate_2007].
Note that the part of beta diversity related with environmental factors are placed into the trans_env class.
The distance matrix in beta_diversity list of microtable object will be invoked for transformation and ploting using trans_beta class when needed.
The analysis referred to the beta diversity in this class mainly include ordination, group distance, clustering and manova.
### Example
We first show the ordination using PCoA.
```{r, echo = TRUE, eval = TRUE}
# we first create an trans_beta object
# measure parameter can invoke the distance matrix in dataset$beta_diversity
t1 <- trans_beta$new(dataset = dataset, group = "Group", measure = "bray")
```
```{r, echo = TRUE, eval = FALSE}
# use PCoA as an example, PCA or NMDS is also available
t1$cal_ordination(ordination = "PCoA")
# t1$res_ordination is the ordination result list
class(t1$res_ordination)
# plot the PCoA result with confidence ellipse
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "ellipse"))
```
```{r, out.width = "650px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_ordination.png")
```
Try other interesting options in the plotting.
```{r, echo = TRUE, eval = FALSE}
t1$plot_ordination(plot_color = "Type", plot_type = "point")
t1$plot_ordination(plot_color = "Group", point_size = 5, point_alpha = .2, plot_type = c("point", "ellipse"), ellipse_chull_fill = FALSE)
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "ellipse", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "chull"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "chull", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("chull", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "chull", "centroid"), add_sample_label = "SampleID")
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = "centroid")
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = "centroid", centroid_segment_alpha = 0.9, centroid_segment_size = 1, centroid_segment_linetype = 1)
t1$plot_ordination(plot_type = c("point", "centroid"), plot_color = "Type", centroid_segment_linetype = 1)
t1$plot_ordination(plot_color = "Saline", point_size = 5, point_alpha = .2, plot_type = c("point", "chull"), ellipse_chull_fill = FALSE, ellipse_chull_alpha = 0.1)
t1$plot_ordination(plot_color = "Group") + theme(panel.grid = element_blank()) + geom_vline(xintercept = 0, linetype = 2) + geom_hline(yintercept = 0, linetype = 2)
```
Then we plot and compare the group distances.
```{r, echo = TRUE, eval = FALSE}
# calculate and plot sample distances within groups
t1$cal_group_distance(within_group = TRUE)
# return t1$res_group_distance
# perform Wilcoxon Rank Sum and Signed Rank Tests
t1$cal_group_distance_diff(method = "wilcox")
t1$plot_group_distance(boxplot_add = "mean")
```
```{r, out.width = "500px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_group_distance_within.png")
```
```{r, echo = TRUE, eval = FALSE}
# calculate and plot sample distances between groups
t1$cal_group_distance(within_group = FALSE)
t1$cal_group_distance_diff(method = "wilcox")
t1$plot_group_distance(boxplot_add = "mean")
```
```{r, out.width = "500px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_group_distance_between.png")
```
Clustering plot is also a frequently used method.
```{r, echo = TRUE, eval = FALSE}
# use replace_name to set the label name, group parameter used to set the color
t1$plot_clustering(group = "Group", replace_name = c("Saline", "Type"))
```
```{r, out.width = "550px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_clustering.png")
```
PerMANOVA[@Anderson_Austral_2001] is often used in the differential test of distances among groups.
```{r, echo = TRUE, eval = FALSE}
# manova for all groups when manova_all = TRUE
t1$cal_manova(manova_all = TRUE)
t1$res_manova
```
```{r, echo = FALSE}
t1$cal_manova(manova_all = TRUE)
pander::pander(t1$res_manova)
```
The parameter manova_all = FALSE can be used to calculate significance for each paired group.
```{r, echo = TRUE, eval = FALSE}
# manova for each paired groups
t1$cal_manova(manova_all = FALSE)
t1$res_manova
```
```{r, echo = FALSE}
t1$cal_manova(manova_all = FALSE)
pander::pander(t1$res_manova)
```
The parameter manova_set has higher priority than manova_all. If manova_set is provided, manova_all parameter will be disabled.
```{r, echo = TRUE, eval = FALSE}
# manova for specified group set: such as "Group + Type"
t1$cal_manova(manova_set = "Group + Type")
t1$res_manova
```
```{r, echo = FALSE}
t1$cal_manova(manova_set = "Group + Type")
pander::pander(t1$res_manova)
```
PERMDISP[@Anderson_Navigating_2011] is also implemented to check multivariate homogeneity of groups dispersions (variances).
```{r, echo = TRUE}
# for the whole comparison and for each paired groups
t1$cal_betadisper()
t1$res_betadisper
```
For the explanation of statistical methods in microbial ecology, please read the references [@Ramette_Multivariate_2007; @Buttigieg_guide_2014].
### Key points
+ trans_beta$new: creating trans_beta object with measure parameter can invoke beta_diversity in microtable object for transformation
+ cal_ordination(): PCoA, PCA and NMDS approaches are all available
+ cal_manova(): cal_manova function can be used for paired comparisons, overall test and multi-factor test
+ plot_group_distance(): manipulating object$res_group_distance_diff can control what statistical results are presented in the plot.