diff --git a/help.html b/help.html index 453a558b..5f58bc6f 100644 --- a/help.html +++ b/help.html @@ -353,14 +353,14 @@

Why are my changes not taking effect? It’s making my results look

Here we are creating a new object from an existing one:

new_rivers <- sample(rivers, 5)
 new_rivers
-
## [1]  250  380  444 1054  350
+
## [1]  327 1885  460 3710  280

Using just this will only print the result and not actually change new_rivers:

new_rivers + 1
-
## [1]  251  381  445 1055  351
+
## [1]  328 1886  461 3711  281

If we want to modify new_rivers and save that modified version, then we need to reassign new_rivers like so:

new_rivers <- new_rivers + 1
 new_rivers
-
## [1]  251  381  445 1055  351
+
## [1]  328 1886  461 3711  281

If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.


@@ -409,7 +409,7 @@

Error: object ‘X’ not found

Make sure you run something like this, with the <- operator:

rivers2 <- new_rivers + 1
 rivers2
-
## [1]  252  382  446 1056  352
+
## [1]  329 1887  462 3712  282

diff --git a/index.html b/index.html index 19103f83..65c03ea6 100644 --- a/index.html +++ b/index.html @@ -348,7 +348,7 @@

Testimonials from our other courses:

Find an Error!?


Feel free to submit typos/errors/etc via the GitHub repository associated with the class: https://github.com/fhdsl/DaSEH

-

This page was last updated on 2024-12-05.

+

This page was last updated on 2024-12-06.

Creative Commons License

diff --git a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html index 45c404a0..b3fe70a2 100644 --- a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html +++ b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html @@ -233,16 +233,6 @@

library(tidyverse) # loads dplyr and other packages!
-
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
-✔ forcats   1.0.0     ✔ readr     2.1.5
-✔ ggplot2   3.5.1     ✔ stringr   1.5.1
-✔ lubridate 1.9.3     ✔ tibble    3.2.1
-✔ purrr     1.0.2     ✔ tidyr     1.3.1
-── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
-✖ dplyr::filter() masks stats::filter()
-✖ dplyr::lag()    masks stats::lag()
-ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
-

Getting data to work with

We will work with data called er about heat-related ER visits between 2011 and 2022, as reported by the state of Colorado, specifically made available by the Colorado Environmental Public Health Tracking program website. Full dataset available at https://coepht.colorado.gov/heat-related-illness.

@@ -299,18 +289,18 @@

slice_sample(er, n = 2)
# A tibble: 2 × 6
-  county    rate lower95cl upper95cl visits  year
-  <chr>    <dbl>     <dbl>     <dbl>  <dbl> <dbl>
-1 Garfield    NA        NA        NA     NA  2011
-2 Morgan      NA        NA        NA     NA  2020
+ county rate lower95cl upper95cl visits year + <chr> <dbl> <dbl> <dbl> <dbl> <dbl> +1 Teller NA NA NA NA 2021 +2 Broomfield NA NA NA NA 2021
slice_sample(er, n = 2)
# A tibble: 2 × 6
-  county     rate lower95cl upper95cl visits  year
-  <chr>     <dbl>     <dbl>     <dbl>  <dbl> <dbl>
-1 Boulder    3.79      1.90      6.32     12  2013
-2 Montezuma NA        NA        NA        NA  2022
+ county rate lower95cl upper95cl visits year + <chr> <dbl> <dbl> <dbl> <dbl> <dbl> +1 Mesa 8.88 4.53 14.7 12 2012 +2 Huerfano 0 0 0 0 2020

Data frames and tibbles

checking names of columns, we can use the colnames() function (or names())

+

checking names of columns, we can use the colnames() function

colnames(er)
@@ -680,7 +670,7 @@

2 2 3 3 3 4 -

GUT CHECK: Which of the following would NOT always work with a column called counties_of_seattle_with_population_over_10,000?

+

GUT CHECK: Which of the following would work well with a column called counties_of_seattle_with_population_over_10,000?

A. Renaming it using rename function to something simpler like seattle_counties_over_10thous.

@@ -720,13 +710,13 @@

This time lets also make it a smaller subset so it is easier for us to see the full dataset as we work through examples.

-
#read_csv("https://daseh.org/data/CO_ER_heat_visits.csv")
+
# er<-read_csv("https://daseh.org/data/CO_ER_heat_visits.csv")
 set.seed(1234)
 er_30 <-slice_sample(er, n = 30)

Subset columns of a data frame - tidyverse way:

-

To grab (or “pull” out) the year column the tidyverse way we can use the pull function:

+

To grab a vector version (or “pull” out) the year column the tidyverse way we can use the pull function:

pull(er_30, year)
@@ -754,6 +744,12 @@

10 2012 # ℹ 20 more rows +

GUT CHECK: What function would be useful for getting a vector version of a column?

+ +

A. pull()

+ +

B. select()

+

Select multiple columns

We can use select to select for multiple columns.

@@ -894,12 +890,6 @@

10 0 0 0 0 2012 # ℹ 20 more rows -

GUT CHECK: What function would be useful for getting a vector version of a column?

- -

A. pull()

- -

B. select()

-

Subsetting Rows

filter function

@@ -1102,11 +1092,7 @@

A. filter() with |

-

B. select() with |

- -

C. filter() with &

- -

D. select() with &

+

B. filter() with &

Summary

@@ -1116,7 +1102,7 @@

  • you can select() based on patterns in the column names
  • you can also select() based on column class with the where() function
  • you can combine multiple tidyselect functions together like select(starts_with("C"), ends_with("state"))
  • -
  • you can combine multiple patterns with the c() function like select(starts_with(c("A", "C")))
  • +
  • you can combine multiple patterns with the c() function like select(starts_with(c("A", "C"))) (see extra slides at the end for more info!)
  • filter() can be used to filter out rows based on logical conditions
  • avoid using quotes when referring to column names with filter()
  • @@ -1131,13 +1117,17 @@

    Lab Part 2

    Get the data

    -
    #er <- read_csv("https://daseh.org/data/Colorado_ER_heat_visits.csv")
    +
    #er <- read_csv("https://daseh.org/data/CO_ER_heat_visits.csv")
     set.seed(1234)
     er_30 <-slice_sample(er, n = 30)
    @@ -1208,6 +1198,14 @@

    <dbl> <dbl> 1 2013 2.95
    +

    Alternative Pipes

    + +

    There are multiple ways to write a pipe and you might see these (they work the same!):

    + +
         |>
    +
    +     %>%
    +

    Adding/Removing Columns

    Adding columns to a data frame: dplyr (tidyverse way)

    @@ -1580,6 +1578,58 @@

    10 0 0 # ℹ 20 more rows +

    Nuances about filter()

    + +
    test <- tibble(A = c(1,2,3,4), B = c(1,2,3,4))
    +test
    + +
    # A tibble: 4 × 2
    +      A     B
    +  <dbl> <dbl>
    +1     1     1
    +2     2     2
    +3     3     3
    +4     4     4
    + +
    # These are technically the same but >= is easier to read
    +# Separating can cause issues
    +filter(test,  B > 2 | B==2)
    + +
    # A tibble: 3 × 2
    +      A     B
    +  <dbl> <dbl>
    +1     2     2
    +2     3     3
    +3     4     4
    + +
    filter(test, B >= 2)
    + +
    # A tibble: 3 × 2
    +      A     B
    +  <dbl> <dbl>
    +1     2     2
    +2     3     3
    +3     4     4
    + +

    Order of operations for filter()

    + +

    Order can matter. Think of individual statements separately first.

    + +
    filter(test,  A>3 | B==2 & B>2) # A is greater than 3 or B is equal to 2 AND (think but also) B must be greater than 2 , thus 2 is dropped.
    + +
    # A tibble: 1 × 2
    +      A     B
    +  <dbl> <dbl>
    +1     4     4
    + +
    filter(test,  A>3 & B>2 | B==2) # A is greater than 3 AND B is greater than 2 leaving only 4s OR B is equal to 2, (since this comes later, 2 is preserved)
    + +
    # A tibble: 2 × 2
    +      A     B
    +  <dbl> <dbl>
    +1     2     2
    +2     4     4
    +

    Ordering the column names of a data frame: alphabetically

    Using the base R order() function.

    diff --git a/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd b/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd index 29a2cd95..9d513da8 100644 --- a/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd +++ b/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd @@ -125,7 +125,7 @@ Subset the rows of `ces_sub` that have **more** than 100 for `Asthma` - how many ### 2.5 -Subset the rows of `ces_sub` that have a `Traffic` value **less** than 500 and an `Asthma` value **more** than 100 - how many are there? +Subset the rows of `ces_sub` that have an `Asthma` value **more** than 100 and a `Traffic` value **less** than 500 and — how many are there? ```{r 2.5response} @@ -134,7 +134,7 @@ Subset the rows of `ces_sub` that have a `Traffic` value **less** than 500 and a ### 2.6 -Subset the rows of `ces_sub` that have a `Traffic` value **less than or equal to** 500 and an `Asthma` value **more** than 100 - how many are there? +Subset the rows of `ces_sub` that have an `Asthma` value **more** than 100 and a `Traffic` value **less than or equal to (`<=`)** 500 — how many are there? ```{r 2.6response} diff --git a/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab_Key.html b/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab_Key.html index a06f2e72..a2662771 100644 --- a/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab_Key.html +++ b/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab_Key.html @@ -362,14 +362,13 @@

    2.3

    2.4

    Subset the rows of ces_sub that have more than 100 for Asthma - how many rows are there? Use filter().

    -
    ces_sub <- filter(ces_sub, Asthma > 100)
    -nrow(ces_sub)
    +
    nrow(filter(ces_sub, Asthma > 100))
    ## [1] 592

    2.5

    -

    Subset the rows of ces_sub that have a Traffic value less than 500 and an Asthma value more than 100 - how many are there?

    -
    filter(ces_sub, Traffic < 500 & Asthma > 100) # all of these options work
    +

    Subset the rows of ces_sub that have an Asthma value more than 100 and a Traffic value less than 500 and — how many are there?

    +
    filter(ces_sub, Asthma > 100 & Traffic < 500) # all of these options work
    ## # A tibble: 130 × 3
     ##    CensusTract Traffic Asthma
     ##          <dbl>   <dbl>  <dbl>
    @@ -384,14 +383,14 @@ 

    2.5

    ## 9 6001407600 311. 115. ## 10 6001408200 437. 187. ## # ℹ 120 more rows
    -
    nrow(filter(ces_sub, Traffic < 500 & Asthma > 100))
    +
    nrow(filter(ces_sub, Asthma > 100 & Traffic < 500))
    ## [1] 130
    -
    nrow(filter(ces_sub, Traffic < 500, Asthma > 100))
    +
    nrow(filter(ces_sub, Asthma > 100, Traffic < 500))
    ## [1] 130

    2.6

    -

    Subset the rows of ces_sub that have a Traffic value less than or equal to 500 and an Asthma value more than 100 - how many are there?

    +

    Subset the rows of ces_sub that have an Asthma value more than 100 and a Traffic value less than or equal to (<=) 500 — how many are there?

    filter(ces_sub, Traffic <= 500 & Asthma > 100) # all of these options work
    ## # A tibble: 130 × 3
     ##    CensusTract Traffic Asthma
    @@ -407,9 +406,9 @@ 

    2.6

    ## 9 6001407600 311. 115. ## 10 6001408200 437. 187. ## # ℹ 120 more rows
    -
    nrow(filter(ces_sub, Traffic <= 500 & Asthma > 100))
    +
    nrow(filter(ces_sub, Asthma > 100 & Traffic <= 500))
    ## [1] 130
    -
    nrow(filter(ces_sub, Traffic <= 500, Asthma > 100))
    +
    nrow(filter(ces_sub, Asthma > 100, Traffic <= 500))
    ## [1] 130