diff --git a/vignettes/solutions.Rmd b/vignettes/solutions.Rmd index a1e4260..36a52ef 100644 --- a/vignettes/solutions.Rmd +++ b/vignettes/solutions.Rmd @@ -17,7 +17,7 @@ Questions: Suggested answers are below. You might have some different code e.g. to customise the volcano plot as you like. Feel free to comment on any of these solutions in the workshop website as described [here](https://github.com/stemangiola/bioc_2020_tidytranscriptomics/blob/master/CONTRIBUTING.md). -```{r out.width = "40%", message=FALSE, warning=FALSE} +```{r out.width = "70%", message=FALSE, warning=FALSE} # load libraries # tidyverse core packages @@ -67,7 +67,7 @@ Answer: PC1: 47%, PC2: 25% What do PC1 and PC2 represent? -```{r out.width = "40%"} +```{r out.width = "70%"} counts_scal_PCA %>% pivot_sample() %>% ggplot(aes(x=PC1, y=PC2, colour=condition, shape=type)) + @@ -113,7 +113,7 @@ Answer: FBgn0025111 3. What code can generate a heatmap of variable genes (starting from count_scaled)? -```{r out.width = "40%"} +```{r out.width = "70%"} counts_scaled %>% # filter lowly abundant @@ -134,7 +134,7 @@ counts_scaled %>% 4. What code can you use to visualise expression of the pasilla gene (gene id: FBgn0261552) -```{r out.width = "40%"} +```{r out.width = "70%"} counts_scaled %>% # extract counts for pasilla gene @@ -150,7 +150,7 @@ counts_scaled %>% 5. What code can generate an interactive volcano plot that has gene ids showing on hover? -```{r out.width = "40%"} +```{r out.width = "70%"} p <- counts_de %>% pivot_transcript() %>% @@ -177,7 +177,7 @@ Tip: You can use "text" instead of "label" if you don't want the column name to 6. What code can generate a heatmap of the top 100 DE genes? -```{r out.width = "40%"} +```{r out.width = "70%"} top100 <- counts_de %>% pivot_transcript() %>% diff --git a/vignettes/supplementary.Rmd b/vignettes/supplementary.Rmd index a8d936c..195cb2d 100644 --- a/vignettes/supplementary.Rmd +++ b/vignettes/supplementary.Rmd @@ -67,7 +67,7 @@ counts_tt %>% We can also check how many counts we have for each sample by making a bar plot. This helps us see whether there are any major discrepancies between the samples more easily. -```{r out.width = "40%"} +```{r out.width = "70%"} ggplot(counts_tt, aes(x=sample, weight=counts, fill=sample)) + geom_bar() + theme_bw() @@ -77,14 +77,14 @@ As we are using ggplot2, we can also easily view by any other variable that's a We can colour by dex treatment. -```{r out.width = "40%"} +```{r out.width = "70%"} ggplot(counts_tt, aes(x=sample, weight=counts, fill=dex)) + geom_bar() + theme_bw() ``` We can colour by cell line. -```{r out.width = "40%"} +```{r out.width = "70%"} ggplot(counts_tt, aes(x=sample, weight=counts, fill=cell)) + geom_bar() + theme_bw() @@ -93,7 +93,7 @@ ggplot(counts_tt, aes(x=sample, weight=counts, fill=cell)) + ## How to examine normalised counts with boxplots -```{r out.width = "40%"} +```{r out.width = "70%"} # scale counts counts_scaled <- counts_tt %>% scale_abundance(factor_of_interest = dex) @@ -111,7 +111,7 @@ counts_scaled %>% ## How to create MDS plot -```{r out.width = "40%"} +```{r out.width = "70%"} airway %>% tidybulk() %>% scale_abundance(factor_of_interest=dex) %>% @@ -126,7 +126,7 @@ airway %>% MA plots enable us to visualise amount of expression (logCPM) versus logFC. Highly expressed genes are towards the right of the plot. We can also colour significant genes (e.g. genes with FDR < 0.05) -```{r out.width = "40%"} +```{r out.width = "70%"} # perform differential testing counts_de <- counts_tt %>% @@ -147,7 +147,7 @@ counts_de %>% A more informative MA plot, integrating some of the packages in tidyverse. -```{r out.width = "40%", warning=FALSE} +```{r out.width = "70%", warning=FALSE} counts_de %>% pivot_transcript() %>% diff --git a/vignettes/tidytranscriptomics.Rmd b/vignettes/tidytranscriptomics.Rmd index 9f7fcdc..4557518 100644 --- a/vignettes/tidytranscriptomics.Rmd +++ b/vignettes/tidytranscriptomics.Rmd @@ -94,7 +94,7 @@ Measuring gene expression on a genome-wide scale has become common practice over There are many steps involved in analysing an RNA sequencing dataset. The main steps for a differential expression analysis are shown in the figure below. Sequenced reads are aligned to a reference genome, then the number of reads mapped to each gene can be counted. This results in a table of counts, which is what we perform statistical analyses on in R. While mapping and counting are important and necessary tasks, today we will be starting from the count data and showing how differential expression analysis can be performed in a friendly way using tidybulk. -```{r, echo=FALSE, out.width = "40%"} +```{r, echo=FALSE, out.width = "70%"} knitr::include_graphics("../inst/vignettes/bioc2020tidybulkpipeline-01.png") ``` @@ -202,7 +202,7 @@ After we run `scale_abundance` we should see some columns have been added at the We can visualise the difference of abundance densities before and after scaling. As tidybulk output is compatible with tidyverse, we can simply pipe it into standard tidyverse functions such as `filter`, `pivot_longer` and `ggplot`. We can also take advantage of ggplot's `facet_wrap` to easily create multiple plots. -```{r out.width = "40%"} +```{r out.width = "70%"} counts_scaled %>% filter(!lowly_abundant) %>% pivot_longer(cols = c("counts", "counts_scaled"), names_to = "source", values_to = "abundance") %>% @@ -302,7 +302,7 @@ counts_scal_PCA %>% pivot_sample() We can now plot the reduced dimensions. -```{r out.width = "40%"} +```{r out.width = "70%"} # PCA plot counts_scal_PCA %>% pivot_sample() %>% @@ -319,7 +319,7 @@ The samples separate by treatment on PC1 which is what we hope to see. PC2 separ An alternative to principal component analysis for examining relationships between samples is using hierarchical clustering. Heatmaps are a nice visualisation to examine hierarchical clustering of your samples. tidybulk has a simple function we can use, `keep_variable`, to extract the most variable genes which we can then plot with tidyHeatmap. -```{r out.width = "40%"} +```{r out.width = "70%"} counts_scaled %>% # filter lowly abundant @@ -461,7 +461,7 @@ topgenes_symbols Volcano plots are a useful genome-wide plot for checking that the analysis looks good. Volcano plots enable us to visualise the significance of change (p-value) versus the fold change (logFC). Highly significant genes are towards the top of the plot. We can also colour significant genes (e.g. genes with false-discovery rate < 0.05) -```{r out.width = "40%"} +```{r out.width = "70%"} # volcano plot, minimal counts_de %>% filter(!lowly_abundant) %>% @@ -474,7 +474,7 @@ counts_de %>% A more informative plot, integrating some of the packages in tidyverse. -```{r out.width = "40%", warning=FALSE} +```{r out.width = "70%", warning=FALSE} counts_de %>% pivot_transcript() %>% @@ -501,7 +501,7 @@ Before following up on the differentially expressed genes with further lab work, With stripcharts we can see if replicates tend to group together and how the expression compares to the other groups. We'll also add a box plot to show the distribution. -```{r out.width = "40%"} +```{r out.width = "70%"} strip_chart <- counts_scaled %>% @@ -525,7 +525,7 @@ A really nice feature of using tidyverse and ggplot2 is that we can make interac We can also specify which parameters from the `aes` we want to show up when we hover over the plot with `tooltip`. -```{r, out.width = "40%", warning=FALSE} +```{r, out.width = "70%", warning=FALSE} strip_chart %>% ggplotly(tooltip = c("label", "y")) ```