04-Diversity-based_class.Rmd

# Diversity-based class

Diversity is one of the core topics in community ecology.
It refers to alpha diversity, beta diversity and gamma diversity.


## trans_alpha class

　Alpha diversity can be transformed and visualized using trans_alpha class.
Creating the object of trans_alpha class can invoke the alpha_diversity data stored in the microtable object.


### Example

Creating trans_alpha object have two return data.frame with prefix 'data_': `data_alpha` and `data_stat`.
The data_alpha is used for the following differential test and visualization.


```{r, echo = TRUE, eval = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Group")
# return t1$data_stat
t1$data_stat[1:5, ]
```

```{r, echo = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Group")
pander::pander(t1$data_stat[1:5, ])
```

Then, we test the differences among groups using Kruskal-Wallis Rank Sum Test (overall test when groups > 2), Wilcoxon Rank Sum Tests (for paired groups),
Dunn's Kruskal-Wallis Multiple Comparisons (for paired groups when groups > 2) and anova with multiple comparisons.

```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "KW")
# return t1$res_diff
t1$res_diff[1:5, ]
```

```{r, echo = FALSE}
suppressWarnings(t1$cal_diff(method = "KW"))
pander::pander(t1$res_diff[1:5, c(1:2, 4:7)])
```

```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "KW_dunn")
# return t1$res_diff
t1$res_diff[1:5, ]
```

```{r, echo = FALSE}
t1$cal_diff(method = "KW_dunn")
pander::pander(t1$res_diff[1:5, c(1, 3:8)])
```

```{r, echo = TRUE, eval = FALSE}
# more options
t1$cal_diff(method = "wilcox")
t1$cal_diff(method = "t.test")
```

Then, let's try to use anova.
```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "anova")
# return t1$res_diff
t1$res_diff
```

```{r, echo = FALSE}
t1$cal_diff(method = "anova")
pander::pander(t1$res_diff)
```

The multi-factor analysis of variance is also supported with the `formula` parameter, such as two-way anova. 

```{r, echo = TRUE, eval = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Group")
t1$cal_diff(method = "anova", formula = "Group+Type")
head(t1$res_diff)
# see the help document for the usage of formula
```


The plot_alpha function add the significance label by searching the results in **object$res_diff** instead of calculating the significance again.
Now, let us plot the mean and se of alpha diversity for each group, and add the anova result.

```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "anova")
t1$plot_alpha(measure = "Chao1")
t1$plot_alpha(measure = "Chao1", order_x_mean = TRUE, add_sig_text_size = 6)
```

```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_letter.png")
```

```{r, echo = TRUE, eval = FALSE}
t1$cal_diff(method = "wilcox")
t1$plot_alpha(measure = "Chao1", shape = "Group")
```

```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_wilcox.png")
```

Let's try to remove the ns in the label by operating the object$res_diff file.

```{r, echo = TRUE, eval = FALSE}
t1$res_diff %<>% base::subset(Significance != "ns")
t1$plot_alpha(measure = "Chao1", xtext_size = 15)
```

```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_wilcox_nons.png")
```

From the v0.12.0, the trans_alpha class supports the differential test of groups within each group by using the by_group parameter.

```{r, echo = TRUE, eval = FALSE}
t1 <- trans_alpha$new(dataset = dataset, group = "Type", by_group = "Group")
t1$cal_diff(method = "wilcox")
t1$plot_alpha(measure = "Shannon")
```

```{r, out.width = "600px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_alpha_wilcox_bygroup.png")
```


Scheirer Ray Hare test is a nonparametric test that is suitable for a two-way factorial experiment.

```{r, echo = TRUE, eval = FALSE}
# require rcompanion package to be installed
t1$cal_diff(method = "scheirerRayHare", formula = "Group+Type")
```

### Key points

  + trans_alpha$new: creating trans_alpha object can invoke alpha_diversity in microtable for transformation
  + cal_diff: formula parameter can be used for multi-factor analysis of variance
  + plot_alpha: the significance label comes from the object$res_diff


## trans_beta class

　The trans_beta class is developed for the beta diversity analysis, i.e. the dissimilarities among samples.
Beta diversity can be defined at different forms[@Tuomisto_diversity_2010] and can be explored with different ways[@Anderson_Navigating_2011].
We encapsulate some commonly-used approaches in microbial ecology[@Ramette_Multivariate_2007].
Note that the part of beta diversity related with environmental factors are placed into the trans_env class.
The distance matrix in beta_diversity list of microtable object will be invoked for transformation and ploting using trans_beta class when needed.
The analysis referred to the beta diversity in this class mainly include ordination, group distance, clustering and manova.


### Example

We first show the ordination using PCoA.

```{r, echo = TRUE, eval = TRUE}
# we first create an trans_beta object
# measure parameter can invoke the distance matrix in dataset$beta_diversity
t1 <- trans_beta$new(dataset = dataset, group = "Group", measure = "bray")
```

```{r, echo = TRUE, eval = FALSE}
# use PCoA as an example, PCA or NMDS is also available
t1$cal_ordination(ordination = "PCoA")
# t1$res_ordination is the ordination result list
class(t1$res_ordination)
# plot the PCoA result with confidence ellipse
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "ellipse"))
```

```{r, out.width = "650px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_ordination.png")
```

Try other interesting options in the plotting.

```{r, echo = TRUE, eval = FALSE}
t1$plot_ordination(plot_color = "Type", plot_type = "point")
t1$plot_ordination(plot_color = "Group", point_size = 5, point_alpha = .2, plot_type = c("point", "ellipse"), ellipse_chull_fill = FALSE)
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "ellipse", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "chull"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "chull", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("chull", "centroid"))
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = c("point", "chull", "centroid"), add_sample_label = "SampleID")
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = "centroid")
t1$plot_ordination(plot_color = "Group", plot_shape = "Group", plot_type = "centroid", centroid_segment_alpha = 0.9, centroid_segment_size = 1, centroid_segment_linetype = 1)
t1$plot_ordination(plot_type = c("point", "centroid"), plot_color = "Type", centroid_segment_linetype = 1)
t1$plot_ordination(plot_color = "Saline", point_size = 5, point_alpha = .2, plot_type = c("point", "chull"), ellipse_chull_fill = FALSE, ellipse_chull_alpha = 0.1)
t1$plot_ordination(plot_color = "Group") + theme(panel.grid = element_blank()) + geom_vline(xintercept = 0, linetype = 2) + geom_hline(yintercept = 0, linetype = 2)
```


Then we plot and compare the group distances.

```{r, echo = TRUE, eval = FALSE}
# calculate and plot sample distances within groups
t1$cal_group_distance(within_group = TRUE)
# return t1$res_group_distance
# perform Wilcoxon Rank Sum and Signed Rank Tests
t1$cal_group_distance_diff(method = "wilcox")
t1$plot_group_distance(boxplot_add = "mean")
```

```{r, out.width = "500px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_group_distance_within.png")
```

```{r, echo = TRUE, eval = FALSE}
# calculate and plot sample distances between groups
t1$cal_group_distance(within_group = FALSE)
t1$cal_group_distance_diff(method = "wilcox")
t1$plot_group_distance(boxplot_add = "mean")
```

```{r, out.width = "500px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_group_distance_between.png")
```

Clustering plot is also a frequently used method.

```{r, echo = TRUE, eval = FALSE}
# use replace_name to set the label name, group parameter used to set the color
t1$plot_clustering(group = "Group", replace_name = c("Saline", "Type"))
```

```{r, out.width = "550px", fig.align="center", echo = FALSE}
knitr::include_graphics("Images/plot_clustering.png")
```


PerMANOVA[@Anderson_Austral_2001] is often used in the differential test of distances among groups.

```{r, echo = TRUE, eval = FALSE}
# manova for all groups when manova_all = TRUE
t1$cal_manova(manova_all = TRUE)
t1$res_manova
```

```{r, echo = FALSE}
t1$cal_manova(manova_all = TRUE)
pander::pander(t1$res_manova)
```

The parameter manova_all = FALSE can be used to calculate significance for each paired group.
```{r, echo = TRUE, eval = FALSE}
# manova for each paired groups
t1$cal_manova(manova_all = FALSE)
t1$res_manova
```

```{r, echo = FALSE}
t1$cal_manova(manova_all = FALSE)
pander::pander(t1$res_manova)
```

The parameter manova_set has higher priority than manova_all. If manova_set is provided, manova_all parameter will be disabled.
```{r, echo = TRUE, eval = FALSE}
# manova for specified group set: such as "Group + Type"
t1$cal_manova(manova_set = "Group + Type")
t1$res_manova
```

```{r, echo = FALSE}
t1$cal_manova(manova_set = "Group + Type")
pander::pander(t1$res_manova)
```


PERMDISP[@Anderson_Navigating_2011] is also implemented to check multivariate homogeneity of groups dispersions (variances).

```{r, echo = TRUE}
# for the whole comparison and for each paired groups
t1$cal_betadisper()
t1$res_betadisper
```

For the explanation of statistical methods in microbial ecology, please read the references [@Ramette_Multivariate_2007; @Buttigieg_guide_2014].

### Key points

  + trans_beta$new: creating trans_beta object with measure parameter can invoke beta_diversity in microtable object for transformation
  + cal_ordination(): PCoA, PCA and NMDS approaches are all available
  + cal_manova(): cal_manova function can be used for paired comparisons, overall test and multi-factor test
  + plot_group_distance(): manipulating object$res_group_distance_diff can control what statistical results are presented in the plot.