diff --git a/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd b/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd index 06e3446b..529f5a53 100644 --- a/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd +++ b/modules/Manipulating_Data_in_R/Manipulating_Data_in_R.Rmd @@ -35,7 +35,7 @@ library(tidyverse) 📃[Cheatsheet](https://daseh.org/modules/cheatsheets/Day-5.pdf) -## Manipulating Data +## Manipulating Data In this module, we will show you how to: @@ -89,7 +89,7 @@ ex_wide <- tibble(State = c("Alabama", "Alaska"), ex_long <- pivot_longer(ex_wide, cols = !State) ``` -Wide: multiple columns per individual, values spread across multiple columns +Wide: multiple columns per individual, values spread across multiple columns ```{r, echo = FALSE} ex_wide @@ -136,7 +136,7 @@ You might see old functions `gather` and `spread` when googling. These are older # `pivot_longer`... -## Reshaping data from **wide to long** {.codesmall} +## Reshaping data from **wide to long** {.codesmall} `pivot_longer()` - puts column data into rows (`tidyr` package) @@ -161,7 +161,7 @@ long_vacc <- wide_vacc %>% pivot_longer(cols = everything()) long_vacc ``` -## Reshaping wide to long: Better column names {.codesmall} +## Reshaping wide to long: Better column names {.codesmall} `pivot_longer()` - puts column data into rows (`tidyr` package) @@ -189,7 +189,7 @@ long_vacc Newly created column names are enclosed in quotation marks. -## Data used: Nitrate exposure +## Data used: Nitrate exposure Nitrate exposure by quarter for populations on public water systems in the state of Washington for 1999-2020. @@ -239,7 +239,7 @@ Un-pivoted columns (`year`, `quarter`, `pop_on_sampled_PWS`) are still columns. long ``` -## Cleaning up long data{.codesmall} +## Cleaning up long data{.codesmall} Let's make the `conc_count` into a proportion. @@ -254,8 +254,8 @@ long Now our data is more tidy, and we can take the averages easily! ```{r} -long %>% - group_by(conc_cat) %>% +long %>% + group_by(conc_cat) %>% summarize("avg_prop" = mean(conc_prop)) ``` @@ -275,7 +275,7 @@ There are many ways to **select** the columns we want. Check out https://dplyr.t
```{r, eval=FALSE} -{wide_data} <- {long_data} %>% +{wide_data} <- {long_data} %>% pivot_wider(names_from = {Old column name: contains new column names}, values_from = {Old column name: contains new cell values}) ``` @@ -285,12 +285,12 @@ There are many ways to **select** the columns we want. Check out https://dplyr.t ```{r} long_vacc -wide_vacc <- long_vacc %>% pivot_wider(names_from = "Month", - values_from = "Rate") +wide_vacc <- long_vacc %>% pivot_wider(names_from = "Month", + values_from = "Rate") wide_vacc ``` -## Reshaping nitrate exposure data{.codesmall} +## Reshaping nitrate exposure data{.codesmall} What if we wanted different columns for each quarter? @@ -335,7 +335,7 @@ knitr::include_graphics("images/joins.png") * Merging/joining data sets together - usually on key variables, usually "id" * `?join` - see different types of joining for `dplyr` * `inner_join(x, y)` - only rows that match for `x` and `y` are kept -* `full_join(x, y)` - all rows of `x` and `y` are kept +* `full_join(x, y)` - all rows of `x` and `y` are kept * `left_join(x, y)` - all rows of `x` are kept even if not merged with `y` * `right_join(x, y)` - all rows of `y` are kept even if not merged with `x` * `anti_join(x, y)` - all rows from `x` not in `y` keeping just columns from `x`. @@ -545,11 +545,11 @@ anti_join(data_cold, data_As, by = "State") # order switched * Merging/joining data sets together - assumes all column names that overlap - use the `by = c("a" = "b")` if they differ * `inner_join(x, y)` - only rows that match for `x` and `y` are kept -* `full_join(x, y)` - all rows of `x` and `y` are kept +* `full_join(x, y)` - all rows of `x` and `y` are kept * `left_join(x, y)` - all rows of `x` are kept even if not merged with `y` * `right_join(x, y)` - all rows of `y` are kept even if not merged with `x` * Use the `tidylog` package for a detailed summary -* `antijoin(x, y)` shows what is only in `x` (missing from `y`) +* `anti_join(x, y)` shows what is only in `x` (missing from `y`) ## Lab Part 2 @@ -596,7 +596,7 @@ dplyr::setdiff(cold_states, A_states) ## Getting the set difference with `setdiff` -Why did we use `dplyr::setdiff`? +Why did we use `dplyr::setdiff`? There is a base R function, also called `setdiff` that requires vectors. diff --git a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd index 2600648f..b69d659b 100644 --- a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd +++ b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab.Rmd @@ -1,7 +1,7 @@ --- title: "Manipulating Data in R Lab" output: html_document -editor_options: +editor_options: chunk_output_type: console --- @@ -41,7 +41,10 @@ Look at the column names using `colnames` - do you notice any patterns? ### 1.3 -Let's rename the column "2011" in "co2" to "CO2_2011" using `rename`. Repeat this for the years 2012, 2013, and 2014. Make sure to reassign to `co2` here and in subsequent steps. +Let's rename the columns "co2" from this type of format: "2011" to this: "CO2_2011" using `rename`. +Be sure to do this for all years 2012, 2013, and 2014. Make sure that you end up with the renamed columns in a data frame named `co2` here and in subsequent steps. + +Hint: If you run code to rename the columns and store back into a data frame of the same name like `co2` you will not be able to re-run the renaming code without error (the columns are already renamed so it won't be able to find the oldname of the column anymore) ``` # General format @@ -119,7 +122,7 @@ Take the code from Questions 1.1 and 1.3-1.7. Chain all of this code together us Modify the code from Question P.1: -- Choose 4 different years to examine +- Choose 4 different years to examine - Select different countries to compare - Call your data `co2_compare2` @@ -176,7 +179,7 @@ What countries are present in "co2" that are not present in "cc"? Use `anti_join ``` # General format -anti_join(data1, data2, by = "") %>% select(index) +anti_join(data1, data2, by = "") %>% select(columnname) ``` ```{r 2.4response} diff --git a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd index b471595a..59a501a5 100644 --- a/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd +++ b/modules/Manipulating_Data_in_R/lab/Manipulating_Data_in_R_Lab_Key.Rmd @@ -1,7 +1,7 @@ --- title: "Manipulating Data in R Lab - Key" output: html_document -editor_options: +editor_options: chunk_output_type: console --- @@ -44,7 +44,10 @@ colnames(co2) ### 1.3 -Let's rename the column "2011" in "co2" to "CO2_2011" using `rename`. Repeat this for the years 2012, 2013, and 2014. Make sure to reassign to `co2` here and in subsequent steps. +Let's rename the columns "co2" from this type of format: "2011" to this: "CO2_2011" using `rename`. +Be sure to do this for all years 2012, 2013, and 2014. Make sure that you end up with the renamed columns in a data frame named `co2` here and in subsequent steps. + +Hint: If you run code to rename the columns and store back into a data frame of the same name like `co2` you will not be able to re-run the renaming code without error (the columns are already renamed so it won't be able to find the oldname of the column anymore) ``` # General format @@ -146,7 +149,7 @@ co2_compare Modify the code from Question P.1: -- Choose 4 different years to examine +- Choose 4 different years to examine - Select different countries to compare - Call your data `co2_compare2` @@ -177,7 +180,7 @@ Open the `Yearly_CC_Disasters` dataset using the url below. Save the dataset as ```{r 2.1response} -cc <- read_csv("https://daseh.org/data/Yearly_CC_Disasters.csv") %>% +cc <- read_csv("https://daseh.org/data/Yearly_CC_Disasters.csv") %>% rename(country = Country) ``` @@ -218,7 +221,7 @@ What countries are present in "co2" that are not present in "cc"? Use `anti_join ``` # General format -anti_join(data1, data2, by = "") %>% select(index) +anti_join(data1, data2, by = "") %>% select(columnname) ``` ```{r 2.4response} @@ -234,7 +237,7 @@ anti_join(cc, co2, by = "country") %>% select(country) %>% distinct() Take the code from 2.2 and save the output as an object "co2_cc". Filter the dataset. Filter so that you only keep Indonesia and Canada. ```{r P.3response} -co2_cc <- full_join(co2, cc, by = "country") %>% +co2_cc <- full_join(co2, cc, by = "country") %>% filter(country %in% c("Indonesia", "Canada")) ``` @@ -279,7 +282,7 @@ Pivot the dataset so that there are columns for country, emissions, and a column ```{r P.6response} co2_cc %>% pivot_wider( - names_from = Indicator, + names_from = Indicator, values_from = disasters ) ```