Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Subsetting] adding cheatsheets and gut checks #194

Merged
merged 8 commits into from
Oct 2, 2024
7 changes: 7 additions & 0 deletions modules/Reproducibility/Reproducibility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,12 @@ knitr::include_graphics("images/reproducibility.png")
ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxpv4qJNvO17y51MajsERGtzR2qClk/edit#slide=id.gf1accd298e_0_673")
```

## We can't get to replicability without reproducibility

```{r, fig.alt="session info", out.width = "80%", echo = FALSE, fig.align='center'}
ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxpv4qJNvO17y51MajsERGtzR2qClk/edit#slide=id.g3070a1ee60e_0_0")
```

## It's worth the wait

```{r, fig.alt="session info", out.width = "80%", echo = FALSE, fig.align='center'}
Expand All @@ -80,6 +86,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxp
ottrpal::include_slide("https://docs.google.com/presentation/d/1nV7x0mIIE4oWVKxpv4qJNvO17y51MajsERGtzR2qClk/edit#slide=id.gf1cd772e00_0_330")
```


## The process

```{r, fig.alt="session info", out.width = "80%", echo = FALSE, fig.align='center'}
Expand Down
114 changes: 86 additions & 28 deletions modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ We are constantly making improvements.
- Reproducible science makes everyone's life easier!
- `readr`has helpful functions like `read_csv()` that can help you import data into R

📃[Cheatsheet](https://daseh.org/modules/cheatsheets/Day-2.pdf)
📃 [Day 2 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-2.pdf)


## Overview
Expand Down Expand Up @@ -382,6 +382,15 @@ test
clean_names(test)
```

## GUT CHECK: Which of the following would NOT always work with a column called `counties_of_seattle_with_population_over_10,000`?

A. Renaming it using `rename` function to something simpler like `seattle_counties_over_10thous`.

B. Keeping it as is and use backticks around the column name when you use it.

C. Keeping it as is and use quotes around the column name when you use it.


## Summary

- data frames are simpler version of a data table
Expand All @@ -396,8 +405,14 @@ clean_names(test)

## Lab Part 1

🏠 [Class Website](https://daseh.org/)
💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd)
🏠 [Class Website](https://daseh.org/)

💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd)

📃 [Day 3 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-3.pdf)

📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf)


# Subsetting Columns

Expand Down Expand Up @@ -472,6 +487,7 @@ knitr::include_graphics("images/tidyselect.png")
head(er_30, 2)
select(er_30, ends_with("cl"), year)
```

## Multiple tidyselect functions

Follows OR logic.
Expand All @@ -481,14 +497,7 @@ select(er_30, ends_with("cl"), starts_with("r"))

```

## Multiple patterns with tidyselect

Need to combine the patterns with the `c()` function.

```{r}
select(er_30, starts_with(c("r", "l")))

```


## The `where()` function can help select columns of a specific class{.codesmall}
Expand All @@ -501,6 +510,11 @@ select(er_30, where(is.numeric))

```

## GUT CHECK: What function would be useful for getting a vector version of a column?

A. `pull()`

B. `select()`



Expand Down Expand Up @@ -636,6 +650,17 @@ knitr::include_graphics("https://media.giphy.com/media/5b5OU7aUekfdSAER5I/giphy.
```
https://media.giphy.com/media/5b5OU7aUekfdSAER5I/giphy.gif


## GUT CHECK: If we want to keep just rows that meet either or two conditions, what code should we use?

A. `filter()` with `|`

B. `select()` with `|`

C. `filter()` with `&`

D. `select()` with `&`

## Summary

- `pull()` to get values out of a data frame/tibble
Expand All @@ -657,6 +682,9 @@ https://media.giphy.com/media/5b5OU7aUekfdSAER5I/giphy.gif

🏠 [Class Website](https://daseh.org)
💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd)
📃 [Day 3 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-3.pdf)

📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf)

## Get the data

Expand Down Expand Up @@ -816,32 +844,23 @@ select(er_30, newcol, everything())
head(select(er_30, newcol, everything()), 3)
```

## Ordering the columns of a data frame: dplyr {.codesmall}
<!-- ## Ordering the columns of a data frame: dplyr {.codesmall} -->

Put `year` at the end ("remove, everything, then add back in"):
<!-- Put `year` at the end ("remove, everything, then add back in"): -->

```{r, eval = FALSE}
select(er_30, !year, everything(), year)
```

```{r, echo = FALSE}
head(select(er_30, !year, everything(), year), 3)
```
<!-- ```{r, eval = FALSE} -->
<!-- select(er_30, !year, everything(), year) -->
<!-- ``` -->

<!-- ```{r, echo = FALSE} -->
<!-- head(select(er_30, !year, everything(), year), 3) -->
<!-- ``` -->

## Ordering the column names of a data frame: alphabetically {.codesmall}

Using the base R `order()` function.

```{r}
order(colnames(er_30))
er_30 %>% select(order(colnames(er_30)))
```


## Ordering the columns of a data frame: dplyr {.codesmall}

In addition to `select` we can also use the `relocate()` function of dplyr to rearrange the columns for more complicated moves.
In addition to `select` we can also use the `relocate()` function of dplyr to rearrange the columns for more complicated moves with the `.before` and `.after` arguments.

For example, let say we just wanted `year` to be before `rate``.

Expand Down Expand Up @@ -881,6 +900,23 @@ arrange(er_30, rate, desc(year)) %>% head(n = 2)
arrange(er_30, desc(year), rate) %>% head(n = 2)
```

## GUT CHECK: What function would be useful for changing a column to be a percentage instead of a ratio?

A. `filter()`

B. `select()`

C. `mutate()`


## GUT CHECK: How would we interpret `er_30 %>% filter(year > 2020) %>% select(year, rate)`?

A. Get the `er_30` data, then filter it for rows with `year` values over 2020, then select only the `year` and `rate` columns.

B. Get the `er_30` data, then filter it rows with `year` values over 2020, then select for rows that have values for `year` and `rate`.




## Summary

Expand Down Expand Up @@ -941,6 +977,9 @@ Even though `$` is easier for creating new columns, `mutate` is really powerful,

💻 [Lab](https://daseh.org/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd)

📃 [Day 3 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-3.pdf)

📃 [Posit's `dplyr` Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf)


```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
Expand All @@ -951,6 +990,25 @@ Image by <a href="https://pixabay.com/users/geralt-9301/?utm_source=link-attribu

# Extra Slides

## Multiple patterns with tidyselect

Need to combine the patterns with the `c()` function.

```{r}
select(er_30, ends_with("cl"), starts_with("r"))
select(er_30, starts_with(c("r", "l"))) # here we combine two patterns

```

## Ordering the column names of a data frame: alphabetically {.codesmall}

Using the base R `order()` function.

```{r}
order(colnames(er_30))
er_30 %>% select(order(colnames(er_30)))
```

## `which()` function

Instead of removing rows like filter, `which()` simply shows where the values occur if they pass a specific condition. We will see that this can be helpful later when we want to select and filter in more complicated ways.
Expand Down
Loading