diff --git a/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd b/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd
index 06e3446b..529f5a53 100644
--- a/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd
+++ b/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd
@@ -35,7 +35,7 @@ library(tidyverse)
📃[Cheatsheet](https://daseh.org/modules/cheatsheets/Day-5.pdf)
-## Manipulating Data
+## Manipulating Data
In this module, we will show you how to:
@@ -89,7 +89,7 @@ ex_wide <- tibble(State = c("Alabama", "Alaska"),
ex_long <- pivot_longer(ex_wide, cols = !State)
```
-Wide: multiple columns per individual, values spread across multiple columns
+Wide: multiple columns per individual, values spread across multiple columns
```{r, echo = FALSE}
ex_wide
@@ -136,7 +136,7 @@ You might see old functions `gather` and `spread` when googling. These are older
# `pivot_longer`...
-## Reshaping data from **wide to long** {.codesmall}
+## Reshaping data from **wide to long** {.codesmall}
`pivot_longer()` - puts column data into rows (`tidyr` package)
@@ -161,7 +161,7 @@ long_vacc <- wide_vacc %>% pivot_longer(cols = everything())
long_vacc
```
-## Reshaping wide to long: Better column names {.codesmall}
+## Reshaping wide to long: Better column names {.codesmall}
`pivot_longer()` - puts column data into rows (`tidyr` package)
@@ -189,7 +189,7 @@ long_vacc
Newly created column names are enclosed in quotation marks.
-## Data used: Nitrate exposure
+## Data used: Nitrate exposure
Nitrate exposure by quarter for populations on public water systems in the state of Washington for 1999-2020.
@@ -239,7 +239,7 @@ Un-pivoted columns (`year`, `quarter`, `pop_on_sampled_PWS`) are still columns.
long
```
-## Cleaning up long data{.codesmall}
+## Cleaning up long data{.codesmall}
Let's make the `conc_count` into a proportion.
@@ -254,8 +254,8 @@ long
Now our data is more tidy, and we can take the averages easily!
```{r}
-long %>%
- group_by(conc_cat) %>%
+long %>%
+ group_by(conc_cat) %>%
summarize("avg_prop" = mean(conc_prop))
```
@@ -275,7 +275,7 @@ There are many ways to **select** the columns we want. Check out https://dplyr.t
```{r, eval=FALSE}
-{wide_data} <- {long_data} %>%
+{wide_data} <- {long_data} %>%
pivot_wider(names_from = {Old column name: contains new column names},
values_from = {Old column name: contains new cell values})
```
@@ -285,12 +285,12 @@ There are many ways to **select** the columns we want. Check out https://dplyr.t
```{r}
long_vacc
-wide_vacc <- long_vacc %>% pivot_wider(names_from = "Month",
- values_from = "Rate")
+wide_vacc <- long_vacc %>% pivot_wider(names_from = "Month",
+ values_from = "Rate")
wide_vacc
```
-## Reshaping nitrate exposure data{.codesmall}
+## Reshaping nitrate exposure data{.codesmall}
What if we wanted different columns for each quarter?
@@ -335,7 +335,7 @@ knitr::include_graphics("images/joins.png")
* Merging/joining data sets together - usually on key variables, usually "id"
* `?join` - see different types of joining for `dplyr`
* `inner_join(x, y)` - only rows that match for `x` and `y` are kept
-* `full_join(x, y)` - all rows of `x` and `y` are kept
+* `full_join(x, y)` - all rows of `x` and `y` are kept
* `left_join(x, y)` - all rows of `x` are kept even if not merged with `y`
* `right_join(x, y)` - all rows of `y` are kept even if not merged with `x`
* `anti_join(x, y)` - all rows from `x` not in `y` keeping just columns from `x`.
@@ -545,11 +545,11 @@ anti_join(data_cold, data_As, by = "State") # order switched
* Merging/joining data sets together - assumes all column names that overlap
- use the `by = c("a" = "b")` if they differ
* `inner_join(x, y)` - only rows that match for `x` and `y` are kept
-* `full_join(x, y)` - all rows of `x` and `y` are kept
+* `full_join(x, y)` - all rows of `x` and `y` are kept
* `left_join(x, y)` - all rows of `x` are kept even if not merged with `y`
* `right_join(x, y)` - all rows of `y` are kept even if not merged with `x`
* Use the `tidylog` package for a detailed summary
-* `antijoin(x, y)` shows what is only in `x` (missing from `y`)
+* `anti_join(x, y)` shows what is only in `x` (missing from `y`)
## Lab Part 2
@@ -596,7 +596,7 @@ dplyr::setdiff(cold_states, A_states)
## Getting the set difference with `setdiff`
-Why did we use `dplyr::setdiff`?
+Why did we use `dplyr::setdiff`?
There is a base R function, also called `setdiff` that requires vectors.
diff --git a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd
index 2600648f..b69d659b 100644
--- a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd
+++ b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd
@@ -1,7 +1,7 @@
---
title: "Manipulating Data in R Lab"
output: html_document
-editor_options:
+editor_options:
chunk_output_type: console
---
@@ -41,7 +41,10 @@ Look at the column names using `colnames` - do you notice any patterns?
### 1.3
-Let's rename the column "2011" in "co2" to "CO2_2011" using `rename`. Repeat this for the years 2012, 2013, and 2014. Make sure to reassign to `co2` here and in subsequent steps.
+Let's rename the columns "co2" from this type of format: "2011" to this: "CO2_2011" using `rename`.
+Be sure to do this for all years 2012, 2013, and 2014. Make sure that you end up with the renamed columns in a data frame named `co2` here and in subsequent steps.
+
+Hint: If you run code to rename the columns and store back into a data frame of the same name like `co2` you will not be able to re-run the renaming code without error (the columns are already renamed so it won't be able to find the oldname of the column anymore)
```
# General format
@@ -119,7 +122,7 @@ Take the code from Questions 1.1 and 1.3-1.7. Chain all of this code together us
Modify the code from Question P.1:
-- Choose 4 different years to examine
+- Choose 4 different years to examine
- Select different countries to compare
- Call your data `co2_compare2`
@@ -176,7 +179,7 @@ What countries are present in "co2" that are not present in "cc"? Use `anti_join
```
# General format
-anti_join(data1, data2, by = "") %>% select(index)
+anti_join(data1, data2, by = "") %>% select(columnname)
```
```{r 2.4response}
diff --git a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd
index b471595a..59a501a5 100644
--- a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd
+++ b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd
@@ -1,7 +1,7 @@
---
title: "Manipulating Data in R Lab - Key"
output: html_document
-editor_options:
+editor_options:
chunk_output_type: console
---
@@ -44,7 +44,10 @@ colnames(co2)
### 1.3
-Let's rename the column "2011" in "co2" to "CO2_2011" using `rename`. Repeat this for the years 2012, 2013, and 2014. Make sure to reassign to `co2` here and in subsequent steps.
+Let's rename the columns "co2" from this type of format: "2011" to this: "CO2_2011" using `rename`.
+Be sure to do this for all years 2012, 2013, and 2014. Make sure that you end up with the renamed columns in a data frame named `co2` here and in subsequent steps.
+
+Hint: If you run code to rename the columns and store back into a data frame of the same name like `co2` you will not be able to re-run the renaming code without error (the columns are already renamed so it won't be able to find the oldname of the column anymore)
```
# General format
@@ -146,7 +149,7 @@ co2_compare
Modify the code from Question P.1:
-- Choose 4 different years to examine
+- Choose 4 different years to examine
- Select different countries to compare
- Call your data `co2_compare2`
@@ -177,7 +180,7 @@ Open the `Yearly_CC_Disasters` dataset using the url below. Save the dataset as
```{r 2.1response}
-cc <- read_csv("https://daseh.org/data/Yearly_CC_Disasters.csv") %>%
+cc <- read_csv("https://daseh.org/data/Yearly_CC_Disasters.csv") %>%
rename(country = Country)
```
@@ -218,7 +221,7 @@ What countries are present in "co2" that are not present in "cc"? Use `anti_join
```
# General format
-anti_join(data1, data2, by = "") %>% select(index)
+anti_join(data1, data2, by = "") %>% select(columnname)
```
```{r 2.4response}
@@ -234,7 +237,7 @@ anti_join(cc, co2, by = "country") %>% select(country) %>% distinct()
Take the code from 2.2 and save the output as an object "co2_cc". Filter the dataset. Filter so that you only keep Indonesia and Canada.
```{r P.3response}
-co2_cc <- full_join(co2, cc, by = "country") %>%
+co2_cc <- full_join(co2, cc, by = "country") %>%
filter(country %in% c("Indonesia", "Canada"))
```
@@ -279,7 +282,7 @@ Pivot the dataset so that there are columns for country, emissions, and a column
```{r P.6response}
co2_cc %>% pivot_wider(
- names_from = Indicator,
+ names_from = Indicator,
values_from = disasters
)
```