diff --git a/modules/Reproducibility/Reproducibility.Rmd b/modules/Reproducibility/Reproducibility.Rmd index 074414cc..96461a60 100644 --- a/modules/Reproducibility/Reproducibility.Rmd +++ b/modules/Reproducibility/Reproducibility.Rmd @@ -67,6 +67,12 @@ knitr::include_graphics("images/reproducibility.png") ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxpv4qJNvO17y51MajsERGtzR2qClk/edit#slide=id.gf1accd298e_0_673") ``` +## We can't get to replicability without reproducibility + +```{r, fig.alt="session info", out.width = "80%", echo = FALSE, fig.align='center'} +ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxpv4qJNvO17y51MajsERGtzR2qClk/edit#slide=id.g3070a1ee60e_0_0") +``` + ## It's worth the wait ```{r, fig.alt="session info", out.width = "80%", echo = FALSE, fig.align='center'} @@ -80,6 +86,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxp ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxpv4qJNvO17y51MajsERGtzR2qClk/edit#slide=id.gf1cd772e00_0_330") ``` + ## The process ```{r, fig.alt="session info", out.width = "80%", echo = FALSE, fig.align='center'} diff --git a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd index 506141df..b169dd0e 100644 --- a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd +++ b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd @@ -30,7 +30,7 @@ We are constantly making improvements. - Reproducible science makes everyone's life easier! - `readr`has helpful functions like `read_csv()` that can help you import data into R -📃[Cheatsheet](https://daseh.org/modules/cheatsheets/Day-2.pdf) +📃 [Day 2 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-2.pdf) ## Overview @@ -382,6 +382,15 @@ test clean_names(test) ``` +## GUT CHECK: Which of the following would NOT always work with a column called `counties_of_seattle_with_population_over_10,000`? + +A. Renaming it using `rename` function to something simpler like `seattle_counties_over_10thous`. + +B. Keeping it as is and use backticks around the column name when you use it. + +C. Keeping it as is and use quotes around the column name when you use it. + + ## Summary - data frames are simpler version of a data table @@ -396,8 +405,14 @@ clean_names(test) ## Lab Part 1 -🏠 [Class Website](https://daseh.org/) -💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd) +🏠 [Class Website](https://daseh.org/) + +💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd) + +📃 [Day 3 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-3.pdf) + +📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf) + # Subsetting Columns @@ -472,6 +487,7 @@ knitr::include_graphics("images/tidyselect.png") head(er_30, 2) select(er_30, ends_with("cl"), year) ``` + ## Multiple tidyselect functions Follows OR logic. @@ -481,14 +497,7 @@ select(er_30, ends_with("cl"), starts_with("r")) ``` -## Multiple patterns with tidyselect -Need to combine the patterns with the `c()` function. - -```{r} -select(er_30, starts_with(c("r", "l"))) - -``` ## The `where()` function can help select columns of a specific class{.codesmall} @@ -501,6 +510,11 @@ select(er_30, where(is.numeric)) ``` +## GUT CHECK: What function would be useful for getting a vector version of a column? + +A. `pull()` + +B. `select()` @@ -636,6 +650,17 @@ knitr::include_graphics("https://media.giphy.com/media/5b5OU7aUekfdSAER5I/giphy. ``` https://media.giphy.com/media/5b5OU7aUekfdSAER5I/giphy.gif + +## GUT CHECK: If we want to keep just rows that meet either or two conditions, what code should we use? + +A. `filter()` with `|` + +B. `select()` with `|` + +C. `filter()` with `&` + +D. `select()` with `&` + ## Summary - `pull()` to get values out of a data frame/tibble @@ -657,6 +682,9 @@ https://media.giphy.com/media/5b5OU7aUekfdSAER5I/giphy.gif 🏠 [Class Website](https://daseh.org) 💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd) +📃 [Day 3 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-3.pdf) + +📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf) ## Get the data @@ -816,32 +844,23 @@ select(er_30, newcol, everything()) head(select(er_30, newcol, everything()), 3) ``` -## Ordering the columns of a data frame: dplyr {.codesmall} + -Put `year` at the end ("remove, everything, then add back in"): + -```{r, eval = FALSE} -select(er_30, !year, everything(), year) -``` - -```{r, echo = FALSE} -head(select(er_30, !year, everything(), year), 3) -``` + + + + + + -## Ordering the column names of a data frame: alphabetically {.codesmall} - -Using the base R `order()` function. - -```{r} -order(colnames(er_30)) -er_30 %>% select(order(colnames(er_30))) -``` ## Ordering the columns of a data frame: dplyr {.codesmall} -In addition to `select` we can also use the `relocate()` function of dplyr to rearrange the columns for more complicated moves. +In addition to `select` we can also use the `relocate()` function of dplyr to rearrange the columns for more complicated moves with the `.before` and `.after` arguments. For example, let say we just wanted `year` to be before `rate``. @@ -881,6 +900,23 @@ arrange(er_30, rate, desc(year)) %>% head(n = 2) arrange(er_30, desc(year), rate) %>% head(n = 2) ``` +## GUT CHECK: What function would be useful for changing a column to be a percentage instead of a ratio? + +A. `filter()` + +B. `select()` + +C. `mutate()` + + +## GUT CHECK: How would we interpret `er_30 %>% filter(year > 2020) %>% select(year, rate)`? + +A. Get the `er_30` data, then filter it for rows with `year` values over 2020, then select only the `year` and `rate` columns. + +B. Get the `er_30` data, then filter it rows with `year` values over 2020, then select for rows that have values for `year` and `rate`. + + + ## Summary @@ -941,6 +977,9 @@ Even though `$` is easier for creating new columns, `mutate` is really powerful, 💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd) +📃 [Day 3 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-3.pdf) + +📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf) ```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'} @@ -951,6 +990,25 @@ Image by % select(order(colnames(er_30))) +``` + ## `which()` function Instead of removing rows like filter, `which()` simply shows where the values occur if they pass a specific condition. We will see that this can be helpful later when we want to select and filter in more complicated ways.