Merge branch 'master' of https://github.com/nbisweden/workshop-r

NBISweden · Oct 17, 2024 · 157f817 · 157f817
2 parents 220bd92 + 3dcfb98
commit 157f817
Show file tree

Hide file tree

Showing 20 changed files with 1,318 additions and 284 deletions.
diff --git a/README.md b/README.md
@@ -37,9 +37,33 @@ If you are using command line, you can install `vd` and open and edit the file `
 
 ### Docker
 
-A docker container is used in GitHub actions to build the website. The Dockerfile contains the image definition. To update the docker image, follow the steps below:
+R packages needed to build the website and run the labs are all contained in a Docker container. To run docker container locally, follow instructions below:
+
+:exclamation: Image is about 4.8 GB!
+
+```
+# pull the container
+docker pull --platform=linux/amd64 ghcr.io/nbisweden/workshop-r:latest
+
+# render whole website
+docker run --platform=linux/amd64 --rm -u $(id -u ${USER}):$(id -g ${USER}) -v ${PWD}:/rmd ghcr.io/nbisweden/workshop-r:latest Rscript -e 'rmarkdown::render_site()'
+
+# render one file
+docker run --platform=linux/amd64 --rm -u $(id -u ${USER}):$(id -g ${USER}) -v ${PWD}:/rmd ghcr.io/nbisweden/workshop-r:latest Rscript -e 'rmarkdown::render("index.Rmd")'
+```
+
+To run RStudio server and develop in the browser, run;
+
+```
+docker run --platform=linux/amd64 --rm -e PASSWORD=rstudio -p 8788:8787 -v ${PWD}:/rmd ghcr.io/nbisweden/workshop-r:latest
+```
+
+Go to [http://localhost:8788/](http://localhost:8788/) or [http://0.0.0.0:8788](http://0.0.0.0:8788). Username is `rstudio` and password is `rstudio`. Change to folder `/rmd` to see your files.
+
+To add new packages, you need to update `Dockerfile`, rebuild the container, test it and push it to repository. Make changes to the `Dockerfile` as needed. Then to rebuild and push the docker image, follow the steps below:
 
 :exclamation: Remember to update the version number
+:exclamation: Remember to render the whole website to make sure everything works
 
 ```
 # build container and add tags
@@ -50,12 +74,6 @@ docker tag ghcr.io/nbisweden/workshop-r:1.1.0 ghcr.io/nbisweden/workshop-r:lates
 docker login ghcr.io
 docker push ghcr.io/nbisweden/workshop-r:1.1.0
 docker push ghcr.io/nbisweden/workshop-r:latest
-
-# run container locally
-# render whole website
-docker run --platform=linux/amd64 --rm -u $(id -u ${USER}):$(id -g ${USER}) -v ${PWD}:/rmd ghcr.io/nbisweden/workshop-r:latest Rscript -e 'rmarkdown::render_site()'
-# render one file
-docker run --platform=linux/amd64 --rm -u $(id -u ${USER}):$(id -g ${USER}) -v ${PWD}:/rmd ghcr.io/nbisweden/workshop-r:latest Rscript -e 'rmarkdown::render("index.Rmd")'
 ```
 
 ---

diff --git a/_site.yml b/_site.yml
@@ -37,4 +37,6 @@ navbar:
       href: home_precourse.html
     - text: Info
       href: home_info.html
+    - text: Projects
+      href: home_projects.html
 
diff --git a/data/slide_intro/num_pkgs.jpg b/data/slide_intro/num_pkgs.jpg
diff --git a/data/slide_programming/Data_Information_Knowledge.png b/data/slide_programming/Data_Information_Knowledge.png
diff --git a/data/slide_programming/Data_classification.png b/data/slide_programming/Data_classification.png
diff --git a/data/slide_r_environment/ggplot2_CRAN.png b/data/slide_r_environment/ggplot2_CRAN.png
diff --git a/home_content.Rmd b/home_content.Rmd
@@ -30,23 +30,25 @@ This page contains links to different lectures (slides) and practical exercises
 * [Intro to R (Slides)](slide_r_intro.html)  
 * [Intro to R environment (Slides)](slide_r_environment.html)
 * [Intro to programming in R (Slides)](slide_r_programming_1.html)  
-* [Variables and Operators (Slides)](slide_elements_1.pdf)  
+* [Variables and Operators (Slides)](slide_r_elements_1.html)  
 * [Data types (Lab)](lab_datatypes.html)  
-* [Vectors and Strings (Slides)](slide_elements_2.pdf)  
-* [Matrices, Lists and Dataframes (Slides)](slide_elements_3.pdf)  
+* [Vectors and Strings (Slides)](slide_r_elements_2.html)  
+* [Matrices, Lists and Dataframes (Slides)](slide_r_elements_3.html)  
 * [Working with Vectors (Lab)](lab_vectors.html)
 * [Dataframes (Lab)](lab_dataframes.html)
+* [Loops and functions (Slides)](slide_r_elements_4.html)
+* [Loops and functions (Lab)](lab_loops.html)
 
 **Data wrangling**
 
-* [Loading data (Slides)](slide_loadingdata.pdf)
+* [Loading data (Slides)](slide_loading_data.html)
 * [Loading data (Lab)](lab_loadingdata.html)  
 * [Tidyverse (Slides)](slide_tidyverse.html)  
 * [Tidyverse (Lab)](lab_tidyverse.html)  
 
 **Graphics**
 
-* [Graphics with base R (Slides)](slide_graphics.pdf)  
+* [Graphics with base R (Slides)](slide_base_graphics.html)  
 * [Graphics with base R (Lab)](lab_graphics.html)  
 * [Graphics with ggplot2 (Slides)](slide_ggplot2.html)  
 * [Working with ggplot2 (Lab)](lab_ggplot2.html)  
@@ -58,7 +60,7 @@ This page contains links to different lectures (slides) and practical exercises
 **Useful resources**
 
 * [Data structures in R](data/common/R_data_structures_ver_1_1.pdf)  
-* [Color names in R](data/common/Rolor.pdf)  
+* [Color names in R](data/common/Rcolor.pdf)  
 * [Visualising data](data/common/rules_for_using_color.pdf)  
 * [Naming conventions in R](data/common/Rnaming.pdf)  
 * [Introduction to statistical tests in R](data/common/stats_tests.pdf)  

diff --git a/home_precourse.Rmd b/home_precourse.Rmd
@@ -65,21 +65,26 @@ RStudio provides you with tools like code editor with highlighting, project mana
 
 Extra R packages used in the workshop exercises (if any) are listed below. It is recommended that you install this in advance. Simply copy and paste the code into R.
 
-```{r,eval=TRUE,chunk.title=NULL,echo=FALSE,comment="",class.output="r"}
-# this code block reads package names from '_site.yml' and prints them as installation instruction.
+```{r include=FALSE}
+# this first chunk runs through the root directory, finds the installed packages across the files and prints them as installation instruction.
 
-pkg <- yaml::read_yaml("_site.yml")
+#Add to the pkg_discard object the packages you want to discard from the list
+
+pkg<-unique(renv::dependencies()$Package)
+
+pkg_discard<-c("mkteachr", "manipulateWidget")
+
+pkg_list<-pkg[!pkg %in% pkg_discard]
+
+```
+
+```{r echo=FALSE, warning=FALSE, chunk.title=NULL, class.output="r", comment="", r,eval=TRUE}
 
-if(!is.null(pkg$packages$packages_cran_student)) {
  cat("# install from cran\n")
- cat(paste0("install.packages(c('",paste(pkg$packages$packages_cran_student,sep="",collapse="','"),"'))"))
+ cat(paste0("install.packages(c('",paste(pkg_list,sep="",collapse="','"),"'))"))
  cat("\n")
-}
 
-if(!is.null(pkg$packages$packages_bioc_student)) {
- cat("# install from bioconductor\n")
- cat(paste0("BiocManager::install(c('",paste(pkg$packages$packages_bioc_student,sep="",collapse="','"),"'))"))
-}
+
 ```
 
 `r fa1("chevron-circle-right")` &nbsp; **Install Docker**

diff --git a/home_projects.Rmd b/home_projects.Rmd
@@ -0,0 +1,224 @@
+---
+title: "Projects"
+output:
+  bookdown::html_document2:
+    highlight: textmate
+    toc: false
+    toc_float:
+      collapsed: true
+      smooth_scroll: true
+      print: false
+    toc_depth: 4
+    number_sections: false
+    df_print: default
+    code_folding: none
+    self_contained: false
+    keep_md: false
+    encoding: 'UTF-8'
+    css: "assets/lab.css"
+    include:
+      after_body: assets/footer-lab.html
+---
+
+```{r,child="assets/header-lab.Rmd"}
+```
+
+Hands-on analysis of actual data is the best way to learn R programming. This page contains some data sets that you can use to explore what you have learned in this course. For each data set, a brief description as well as download instructions are provided. 
+
+<div class="alert alert-info">
+  <strong> Try to focus on using the tools from the course to explore the data, rather than worrying about producing a perfect report with a coherent analysis workflow.</strong>
+</div>
+
+
+On the last day you will present your Rmd file (or rather, the resulting html report) and share with the class what your data was about.
+
+---
+
+## Palmer penguins 🐧
+
+- This is a data set containing a series of measurements for three species of penguins collected in the Palmer station in Antarctica.
+- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/heplots/peng.html>
+
+<details>
+  <summary>Download instructions</summary>
+```{r, warning=F, message=F}
+penguins <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/heplots/peng.csv", header = T, sep = ",")
+str(penguins)
+```
+</details>
+
+---
+
+## Drinking habits 🍷
+
+- Data from a national survey on the drinking habits of american citizens in 2001 and 2002.
+- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/stevedata/nesarc_drinkspd.html>
+
+<details>
+  <summary>Download instructions</summary>
+```{r}
+library(dplyr)
+# this will download the csv file directly from the web
+drinks <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/stevedata/nesarc_drinkspd.csv", header = T, sep = ",")
+# the lines below will take a sample from the full data set
+set.seed(seed = 2)
+drinks <- sample_n(drinks, size = 3000, replace = F)
+# and here we check the structure of the data
+str(drinks)
+```
+</details>
+
+---
+
+## Car crashes 🚗
+
+- Data from car accidents in the US between 1997-2002.
+- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/DAAG/nassCDS.html>
+
+<details>
+  <summary>Download instructions</summary>
+```{r}
+library(dplyr)
+# this will download the csv file directly from the web
+crashes <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/DAAG/nassCDS.csv", header = T, sep = ",")
+# the lines below will take a sample from the full data set
+set.seed(seed = 2)
+crashes <- sample_n(crashes, size = 3000, replace = F)
+# and here we check the structure of the data
+str(crashes)
+```
+</details>
+
+---
+
+## Gapminder health and wealth 📈
+
+- This is a collection of country indicators from the Gapminder dataset for the years 2000-2016.
+- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/dslabs/gapminder.html>
+
+<details>
+  <summary>Download instructions</summary>
+```{r}
+library(dplyr)
+# this will download the csv file directly from the web
+gapminder <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/dslabs/gapminder.csv", header = T, sep = ",")
+# here we filter the data to remove anything before the year 2000
+gapminder <- gapminder |> filter(year >= 2000)
+# and here we check the structure of the data
+str(gapminder)
+```
+</details>
+
+---
+
+## StackOverflow survey 🖥️
+
+- This is a downsampled and modified version of one of StackOverflow's annual surveys where users respond to a series of questions related to careers in technology and coding.
+- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/modeldata/stackoverflow.html>
+
+<details>
+  <summary>Download instructions</summary>
+```{r}
+library(dplyr)
+# this will download the csv file directly from the web
+stackoverflow <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/modeldata/stackoverflow.csv", header = T, sep = ",")
+# the lines below will take a sample from the full data set
+set.seed(2)
+stackoverflow <- sample_n(stackoverflow, size = 3000)
+# and here we check the structure of the data
+str(stackoverflow)
+```
+</details>
+
+---
+
+## Doctor visits 🤒
+
+- Data on the frequency of doctor visits in the past two weeks in Australia for the years 1977 and 1978.
+- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/AER/DoctorVisits.html>
+
+<details>
+  <summary>Download instructions</summary>
+```{r}
+library(dplyr)
+# this will download the csv file directly from the web
+doctor <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/AER/DoctorVisits.csv", header = T, sep = ",")
+# the lines below will take a sample from the full data set
+set.seed(2)
+doctor <- sample_n(doctor, size = 3000)
+# and here we check the structure of the data
+str(doctor)
+```
+</details>
+
+---
+
+## Video Game Sales 🎮
+
+- This data set contains sales figures for video games titles released in 2001 and 2002.
+- Data description: <https://mavenanalytics.io/data-playground?order=date_added%2Cdesc&search=Video%20Game%20Sales> 
+  - Click on "Preview Data" and "VG Data Dictionary" to see the description for each column.
+
+<details>
+  <summary>Download instructions</summary>
+```{r, warning=F, message=F}
+library(dplyr)
+library(lubridate)
+# this will download the file to your working directory
+download.file(url = "https://maven-datasets.s3.amazonaws.com/Video+Game+Sales/Video+Game+Sales.zip", destfile = "video_game_sales.zip")
+# this will unzip the file and read it into R
+videogames <- read.table(unz(filename = "vgchartz-2024.csv", "video_game_sales.zip"), header = T, sep = ",", quote = "\"", fill = T)
+# this will select rows corresponding to years 2001 and 2002
+videogames <- filter(videogames, year(as_date(release_date)) %in% c(2001,2002))
+# and here we check the structure of the data
+str(videogames)
+```
+</details>
+
+---
+
+## LEGO Sets 🏗️
+
+- This data set contains the description of all LEGO sets released from 2000 to 2009.
+- Data description: <https://mavenanalytics.io/data-playground?order=date_added%2Cdesc&search=lego>
+  - Click on "Preview Data" and "VG Data Dictionary" to see the description for each column.
+
+<details>
+  <summary>Download instructions</summary>
+```{r, warning=F, message=F}
+library(dplyr)
+# this will download the file to your working directory
+download.file(url = "https://maven-datasets.s3.amazonaws.com/LEGO+Sets/LEGO+Sets.zip", destfile = "lego.csv.zip")
+# this will unzip the file and read it into R
+lego <- read.table(unz(filename = "lego_sets.csv", "lego.csv.zip"), header = T, sep = ",", quote = "\"", fill = T)
+# this will select rows corresponding to years 2000-2009
+lego <- filter(lego, year %in% seq(2000,2009,1))
+# and here we check the structure of the data
+str(lego)
+```
+</details>
+
+---
+
+## Shark attacks 🦈
+
+- This data set contains information on shark attack records from all over the world.
+- Data description: <https://mavenanalytics.io/data-playground?order=date_added%2Cdesc&search=shark>
+  - Click on "Preview Data" and "VG Data Dictionary" to see the description for each column.
+
+<details>
+  <summary>Download instructions</summary>
+```{r, warning=F, message=F}
+library(dplyr)
+# this will download the file to your working directory
+download.file(url = "https://maven-datasets.s3.amazonaws.com/Shark+Attacks/attacks.csv.zip", destfile = "attacks.csv.zip")
+# this will unzip the file and read it into R
+sharks <- read.table(unz(filename = "attacks.csv", "attacks.csv.zip"), header = T, sep = ",", quote = "\"", fill = T)
+# the lines below will take a sample from the full data set
+set.seed(seed = 2)
+sharks <- sample_n(sharks, size = 3000, replace = F)
+str(sharks)
+```
+</details>
+
+***
diff --git a/images/data_frame.png b/images/data_frame.png
diff --git a/images/data_structures.png b/images/data_structures.png
diff --git a/lab_graphics.Rmd b/lab_graphics.Rmd
@@ -593,3 +593,8 @@ You task here is to use the already acquired R knowledge to plot an interesting
 - Be creative,
 - Visualize a selected variables using boxplot and histogram on one plot (HINT: parameter mfrow),
 - Discuss the result with your colleagues and TAs.
+
+```{r}
+unlink(local_file_path)
+```
+