Skip to content

Commit

Permalink
Merge branch 'main' into datasets-tab
Browse files Browse the repository at this point in the history
  • Loading branch information
avahoffman committed Sep 30, 2024
2 parents 573ab97 + ca4ee6b commit 5f28c05
Show file tree
Hide file tree
Showing 13 changed files with 729 additions and 1,120 deletions.
8 changes: 4 additions & 4 deletions help.html
Original file line number Diff line number Diff line change
Expand Up @@ -350,14 +350,14 @@ <h2><strong>Why are my changes not taking effect? It’s making my results look
<p>Here we are creating a new object from an existing one:</p>
<pre class="r"><code>new_rivers &lt;- sample(rivers, 5)
new_rivers</code></pre>
<pre><code>## [1] 350 306 525 350 380</code></pre>
<pre><code>## [1] 430 981 280 352 3710</code></pre>
<p>Using just this will only print the result and not actually change <code>new_rivers</code>:</p>
<pre class="r"><code>new_rivers + 1</code></pre>
<pre><code>## [1] 351 307 526 351 381</code></pre>
<pre><code>## [1] 431 982 281 353 3711</code></pre>
<p>If we want to modify <code>new_rivers</code> and save that modified version, then we need to reassign <code>new_rivers</code> like so:</p>
<pre class="r"><code>new_rivers &lt;- new_rivers + 1
new_rivers</code></pre>
<pre><code>## [1] 351 307 526 351 381</code></pre>
<pre><code>## [1] 431 982 281 353 3711</code></pre>
<p>If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.</p>
<hr />
</div>
Expand Down Expand Up @@ -406,7 +406,7 @@ <h2><strong>Error: object ‘X’ not found</strong></h2>
<p>Make sure you run something like this, with the <code>&lt;-</code> operator:</p>
<pre class="r"><code>rivers2 &lt;- new_rivers + 1
rivers2</code></pre>
<pre><code>## [1] 352 308 527 352 382</code></pre>
<pre><code>## [1] 432 983 282 354 3712</code></pre>
<hr />
</div>
<div id="error-unexpected-in-error-unexpected-in-error-unexpected-x-in" class="section level2">
Expand Down
2 changes: 1 addition & 1 deletion materials_schedule.html
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ <h2>Online Schedule + Materials</h2>
<div id="in-person-code-a-thon-schedule-materials" class="section level2">
<h2>In-person Code-a-thon Schedule + Materials</h2>
<ul>
<li><a href="https://docs.google.com/document/d/182XiteW4-VOWLesVBU6TH1hhyDH7dHkGAXf6P8WYyjg/edit?usp=sharing">Schedule</a></li>
<li><a href="https://docs.google.com/document/d/1ZD-w0vc3Vtv1vf95h6323zaaxORg9vC7Hp4I_IUeW0I/edit?usp=sharing">Schedule</a></li>
<li><a href="https://drive.google.com/drive/folders/1dd0refHBdOHQuMW2ITWsVtIZ1QyhEbgY?usp=sharing">Instructor Slides</a> <!-- - [Lightning Talk Upload Folder](https://drive.google.com/drive/folders/1It8HqAyXGY8NpDU6iVxswMeZCth8kHI5?usp=sharing) --></li>
<li><a href="modules/Project_Template/Project_Template.Rmd">Project Template</a></li>
<li><a href="modules/Project_Example/Project_Example.Rmd">Project Example</a></li>
Expand Down
Binary file modified modules/Intro/Intro.pdf
Binary file not shown.
16 changes: 13 additions & 3 deletions modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -347,19 +347,29 @@ library(janitor)

## janitor `clean_names`

The `clean_names` function can intuit what fixes you might need. Here it make sure year names aren't just a number, so that the colnames don't need ticks or quotes to be used.
The `clean_names` function can intuit what fixes you might need.

The yearly_co2_emissions dataset contains estimated CO2 emissions for 265 countries between the years 1751 and 2014.

```{r}
#library(dasehr)
yearly_co2 <- dasehr::yearly_co2_emissions
#yearly_co2 <- dasehr::yearly_co2_emissions
# or this:
yearly_co2 <-
read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
```

## yearly_co2 column names

```{r}
head(yearly_co2, n = 2)
clean_names(yearly_co2)
```

## janitor `clean_names` can intuit fixes
The `clean_names` function can intuit what fixes you might need. Here it make sure year names aren't just a number, so that the colnames don't need ticks or quotes to be used.

```{r}
clean_names(yearly_co2)
```

## more of clean_names
Expand Down
56 changes: 31 additions & 25 deletions modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html

Large diffs are not rendered by default.

Binary file modified modules/Subsetting_Data_in_R/Subsetting_Data_in_R.pdf
Binary file not shown.
73 changes: 39 additions & 34 deletions modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,33 @@ In this lab you can use the interactive console to explore but please record you

# Part 1

First let's get load our packages.
First let's load our packages.


```{r, message = FALSE}
# don't forget to load the packages that you will need!
library(dplyr)
library(tidyverse)
library(dasehr)
```

Now let's load the ER dataset by running one of these chunks. We can either load the data from the website or from the `dasehr` package.
We'll again work with the CalEnviroScreen dataset, which contains information about environmental factors associated with human health in California.

```{r}
library(dasehr)
ER <- CO_heat_ER_bygender
```
First, load the data from the website, either manually or by using the Data Import menu (find it by clicking on File).

Or
```{r}
ER <-
read_csv("https://daseh.org/data/Colorado_ER_heat_visits_by_county_gender.csv")
ces <- read_csv(file = "https://daseh.org/data/CalEnviroScreen_data.csv")
```

Check that it worked by seeing if you have the `ER` data.

Check that it worked by seeing if you have the `ces` data.

```{r 0response}
```

### 1.1

What class is `ER`?
What class is `ces`?

```{r 1.1response}
Expand All @@ -61,15 +55,15 @@ How many observations (rows) and variables (columns) are in the dataset - try th

### 1.3

Next, rename the column `lower95cl` to be `lower_limit` (hint - use `rename()` and watch out for the order of the new and old names!).
Next, rename the column `CaliforniaCounty` to `CA_county` (hint - use `rename()` and watch out for the order of the new and old names!).

```{r 1.3response}
```

### 1.4

Convert the column names of `ER` to be all upper case. Use `rename_with()`, and the `toupper` command. Save this as a new dataset called `ER_upper`.
Convert the column names of `ces` to be all upper case. Use `rename_with()`, and the `toupper` command. Save this as a new dataset called `ces_upper`.

```{r 1.4response}
Expand All @@ -81,7 +75,7 @@ Convert the column names of `ER` to be all upper case. Use `rename_with()`, and

### P.1

How can you print the first 3 rows and the last 3 rows of `ER` (in two lines of code)?
How can you print the first 3 rows and the last 3 rows of `ces` (in two lines of code)?

```{r P.1response}
Expand All @@ -92,55 +86,64 @@ How can you print the first 3 rows and the last 3 rows of `ER` (in two lines of

### 2.1

Create a subset of the `ER` that only contains the columns: `county`, `year`, and `rate` and assign this object to `ER_sub` - what are the dimensions of this dataset?
Create a subset of the `ces` that only contains the columns: `CensusTract`, `Traffic`, and `Asthma` and assign this object to `ces_sub` - what are the dimensions of this dataset?

`CensusTract`: this is a small, relatively permanent area within a county used to present data from the census and other statistical programs

`Traffic`: A measure of traffic density in vehicle-kilometers per hour per road length, within 150 meters of the census tract boundary. A higher `Traffic` value indicates the presence of more traffic

`Asthma`: Age-adjusted rate of emergency department visits for asthma

```{r 2.1response}
```

### 2.2

Start with `ER` again instead of the dataset you just made. Subset the data to only include the `rate` column and the columns that end with "cl". Hint: use `select()` and `ends_with()`. Assign this subset of the data to be `ER2`. Again take a look at the data and check the dimensions.
Start with `ces` again instead of the dataset you just made. Subset the data to only include the `CensusTract` column and the columns that end with "Pctl". Hint: use `select()` and `ends_with()`. Assign this subset of the data to be `ces2`. Again take a look at the data and check the dimensions.

"Pctl" stands for "percentile".

```{r 2.2response}
```

### 2.3

Pull the variable `rate` from `ER2`. How does this differ form selecting it? Use head() to take a look at both options.
Pull the variable `Asthma` from `ces_sub`. How does this differ from selecting it? Use head() to take a look at both options.

```{r 2.3response}
```

### 2.4

Subset the rows of `ER2` that have **more** than 10 for rate - how many rows are there? Use `filter()`.
Subset the rows of `ces_sub` that have **more** than 100 for `Asthma` - how many rows are there? Use `filter()`.

```{r 2.4response}
```

### 2.5

Subset the rows of `ER` that have a year value **less** than 2012 and **more** than 10 rate - how many are there?
Subset the rows of `ces_sub` that have a `Traffic` value **less** than 500 and an `Asthma` value **more** than 100 - how many are there?


```{r 2.5response}
```

### 2.6

Subset the rows of `ER` that have a year value of **less than or equal to ** 2012 and **more** than 10 rate - how many are there?
Subset the rows of `ces_sub` that have a `Traffic` value **less than or equal to** 500 and an `Asthma` value **more** than 100 - how many are there?

```{r 2.6response}
```

### 2.7

Why do the answers for 2.5 and 2.6 differ?
We used two different criteria for subsetting in 2.5 and 2.6. Why are the number of rows the same for 2.5 and 2.6?

```{r 2.7response}
Expand All @@ -151,7 +154,8 @@ Why do the answers for 2.5 and 2.6 differ?

### P.2

Subset the rows of `ER` for rows that have `county` of `Denver`, **or** **less** than 4 `rate``.
Subset the rows of `ces` for rows that have `CA_county` of "Los Angeles", **or** a `Traffic` value **less** than 300.

How many rows have both?

```{r P.2response}
Expand All @@ -160,7 +164,7 @@ How many rows have both?

### P.3

Select the variables that contain the letter "a" from `ER`.
Select the variables that contain the letter "a" from `ces`. Remember, the variables are the column names.

```{r P.3response}
Expand All @@ -171,8 +175,9 @@ Select the variables that contain the letter "a" from `ER`.

### 3.1

Create a subset called `ER_2012` from `ER` that only contains the rows for the year 2012 and only the columns: `county` and `rate`. `year` should not be included in `ER_sub`.
What are the dimensions of this dataset? Don't use pipes (`%>%`) and instead do this in two steps creating the `ER_2012` object with `filter` and updating it with `select`.
Create a subset called `ces_Alameda` from `ces` that only contains the rows for Alameda and only the columns: `Traffic` and `Asthma`. `CA_county` should not be included in `ces_Alameda`.

What are the dimensions of this dataset? Don't use pipes (`%>%`) and instead do this in two steps creating the `ces_Alameda` object with `filter` and updating it with `select`.

```{r 3.1response}
Expand All @@ -192,17 +197,17 @@ What happens if you do the steps in a different order? Why does this not work?
```

### 1.3
### 3.3

Re-order the rows of `ER_2012` by population in increasing order. Use `arrange()`. What is county with the smallest rate?
Re-order the rows of `ces_Alameda` by `Traffic` value in increasing order. Use `arrange()`. What's the smallest value?

```{r 3.3response}
```

### 1.4
### 3.4

Create a new variable in `ER_2012` called `rate1000`, which is equal to `rate` divided by 1000, using `mutate()`(don't forget to reassign `ER_2012`). Use pipes `%>%`.
Create a new variable in `ces_Alameda` called `Asthma100`, which is equal to `Asthma` divided by 100, using `mutate()`(don't forget to reassign `ces_Alameda`). Use pipes `%>%`. Take a look at the data now!

```
# General format
Expand All @@ -218,17 +223,17 @@ NEWDATA <- OLD_DATA %>% mutate(NEW_COLUMN = OLD_COLUMN)

### P.4

Move the `rate1000` column to be before `county` in the `ER_2012` dataset. Use `relocate()`.
Move the `Asthma100` column to be before `Traffic` in the `ces_Alameda` dataset. Use `relocate()`.

```{r P.4response}
```

### P.5

How can you find the value of `rate` in 2020 for Statewide for Females - using the initial ER data - without just looking at the data manually and instead use functions we learned today?
Using the original `ces` data, how can you find the values of `ApproxLocation` for areas within zip code 90745 (in Los Angeles county) that also have a CES4.0 score in the range 90-95% - without just looking at the data manually and instead use functions we learned today? (Hint: It can be helpful to look at your data first)

Note that gender was recorded as binary, which we know isn’t really accurate. This is something you might encounter. Please see this article about ways to measure gender in a more inclusive way: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526522/.
`CES4.0PercRange`: Percentile of the CalEnviroScreen score, grouped by 5% increments. The CalEnviroScreen score is a measure of the negative environmental effects seen in a given region. Those zip codes that have a percentile range of 90-95% are those regions that experience the highest effects of pollution in California.

```{r P.5response}
Expand Down
447 changes: 106 additions & 341 deletions modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.html

Large diffs are not rendered by default.

Loading

0 comments on commit 5f28c05

Please sign in to comment.