Merge branch 'main' into datasets-tab

fhdsl · Sep 30, 2024 · 5f28c05 · 5f28c05
2 parents 573ab97 + ca4ee6b
commit 5f28c05
Show file tree

Hide file tree

Showing 13 changed files with 729 additions and 1,120 deletions.
diff --git a/help.html b/help.html
@@ -350,14 +350,14 @@ <h2><strong>Why are my changes not taking effect? It’s making my results look
 <p>Here we are creating a new object from an existing one:</p>
 <pre class="r"><code>new_rivers &lt;- sample(rivers, 5)
 new_rivers</code></pre>
-<pre><code>## [1] 350 306 525 350 380</code></pre>
+<pre><code>## [1]  430  981  280  352 3710</code></pre>
 <p>Using just this will only print the result and not actually change <code>new_rivers</code>:</p>
 <pre class="r"><code>new_rivers + 1</code></pre>
-<pre><code>## [1] 351 307 526 351 381</code></pre>
+<pre><code>## [1]  431  982  281  353 3711</code></pre>
 <p>If we want to modify <code>new_rivers</code> and save that modified version, then we need to reassign <code>new_rivers</code> like so:</p>
 <pre class="r"><code>new_rivers &lt;- new_rivers + 1
 new_rivers</code></pre>
-<pre><code>## [1] 351 307 526 351 381</code></pre>
+<pre><code>## [1]  431  982  281  353 3711</code></pre>
 <p>If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.</p>
 <hr />
 </div>
@@ -406,7 +406,7 @@ <h2><strong>Error: object ‘X’ not found</strong></h2>
 <p>Make sure you run something like this, with the <code>&lt;-</code> operator:</p>
 <pre class="r"><code>rivers2 &lt;- new_rivers + 1
 rivers2</code></pre>
-<pre><code>## [1] 352 308 527 352 382</code></pre>
+<pre><code>## [1]  432  983  282  354 3712</code></pre>
 <hr />
 </div>
 <div id="error-unexpected-in-error-unexpected-in-error-unexpected-x-in" class="section level2">

diff --git a/materials_schedule.html b/materials_schedule.html
@@ -492,7 +492,7 @@ <h2>Online Schedule + Materials</h2>
 <div id="in-person-code-a-thon-schedule-materials" class="section level2">
 <h2>In-person Code-a-thon Schedule + Materials</h2>
 <ul>
-<li><a href="https://docs.google.com/document/d/182XiteW4-VOWLesVBU6TH1hhyDH7dHkGAXf6P8WYyjg/edit?usp=sharing">Schedule</a></li>
+<li><a href="https://docs.google.com/document/d/1ZD-w0vc3Vtv1vf95h6323zaaxORg9vC7Hp4I_IUeW0I/edit?usp=sharing">Schedule</a></li>
 <li><a href="https://drive.google.com/drive/folders/1dd0refHBdOHQuMW2ITWsVtIZ1QyhEbgY?usp=sharing">Instructor Slides</a> <!-- - [Lightning Talk Upload Folder](https://drive.google.com/drive/folders/1It8HqAyXGY8NpDU6iVxswMeZCth8kHI5?usp=sharing) --></li>
 <li><a href="modules/Project_Template/Project_Template.Rmd">Project Template</a></li>
 <li><a href="modules/Project_Example/Project_Example.Rmd">Project Example</a></li>

diff --git a/modules/Intro/Intro.pdf b/modules/Intro/Intro.pdf
diff --git a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd
@@ -347,19 +347,29 @@ library(janitor)
 
 ## janitor `clean_names`
 
-The `clean_names` function can intuit what fixes you might need. Here it make sure year names aren't just a number, so that the colnames don't need ticks or quotes to be used.
+The `clean_names` function can intuit what fixes you might need. 
 
 The yearly_co2_emissions dataset contains estimated CO2 emissions for 265 countries between the years 1751 and 2014.
 
 ```{r}
 #library(dasehr)
-yearly_co2 <- dasehr::yearly_co2_emissions
+#yearly_co2 <- dasehr::yearly_co2_emissions
 # or this:
 yearly_co2 <- 
   read_csv("https://daseh.org/data/Yearly_CO2_Emissions_1000_tonnes.csv")
+```
+
+## yearly_co2 column names
+
+```{r}
 head(yearly_co2, n = 2)
-clean_names(yearly_co2)
+```
+
+## janitor `clean_names` can intuit fixes
+The `clean_names` function can intuit what fixes you might need. Here it make sure year names aren't just a number, so that the colnames don't need ticks or quotes to be used.
 
+```{r}
+clean_names(yearly_co2)
 ```
 
 ## more of clean_names

diff --git a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.html
diff --git a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.pdf b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.pdf
diff --git a/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd b/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.Rmd
@@ -13,39 +13,33 @@ In this lab you can use the interactive console to explore but please record you
 
 # Part 1
 
-First let's get load our packages.
+First let's load our packages.
 
 
 ```{r, message = FALSE}
 # don't forget to load the packages that you will need!
 library(dplyr)
 library(tidyverse)
-library(dasehr)
 ```
 
-Now let's load the ER dataset by running one of these chunks. We can either load the data from the website or from the `dasehr` package.
+We'll again work with the CalEnviroScreen dataset, which contains information about environmental factors associated with human health in California.
 
-```{r}
-library(dasehr)
-ER <- CO_heat_ER_bygender
-```
+First, load the data from the website, either manually or by using the Data Import menu (find it by clicking on File).
 
-Or
 ```{r}
-ER <- 
-  read_csv("https://daseh.org/data/Colorado_ER_heat_visits_by_county_gender.csv")
-
+ces <- read_csv(file = "https://daseh.org/data/CalEnviroScreen_data.csv")
 ```
 
-Check that it worked by seeing if you have the `ER` data.
+
+Check that it worked by seeing if you have the `ces` data.
 
 ```{r 0response}
 
 ```
 
 ### 1.1
 
-What class is `ER`?
+What class is `ces`?
 
 ```{r 1.1response}
 
@@ -61,15 +55,15 @@ How many observations (rows) and variables (columns) are in the dataset - try th
 
 ### 1.3
 
-Next, rename the column `lower95cl`  to be `lower_limit` (hint - use `rename()` and watch out for the order of the new and old names!). 
+Next, rename the column `CaliforniaCounty` to `CA_county` (hint - use `rename()` and watch out for the order of the new and old names!). 
 
 ```{r 1.3response}
 
 ```
 
 ### 1.4
 
-Convert the column names of `ER` to be all upper case. Use `rename_with()`, and the `toupper` command. Save this as a new dataset called `ER_upper`.
+Convert the column names of `ces` to be all upper case. Use `rename_with()`, and the `toupper` command. Save this as a new dataset called `ces_upper`.
 
 ```{r 1.4response}
 
@@ -81,7 +75,7 @@ Convert the column names of `ER` to be all upper case. Use `rename_with()`, and
 
 ### P.1
 
-How can you print the first 3 rows and the last 3 rows of `ER` (in two lines of code)?
+How can you print the first 3 rows and the last 3 rows of `ces` (in two lines of code)?
 
 ```{r P.1response}
 
@@ -92,55 +86,64 @@ How can you print the first 3 rows and the last 3 rows of `ER` (in two lines of
 
 ### 2.1
 
-Create a subset of the `ER` that only contains the columns: `county`, `year`, and `rate` and assign this object to `ER_sub` - what are the dimensions of this dataset?
+Create a subset of the `ces` that only contains the columns: `CensusTract`, `Traffic`, and `Asthma` and assign this object to `ces_sub` - what are the dimensions of this dataset?
+
+`CensusTract`: this is a small, relatively permanent area within a county used to present data from the census and other statistical programs
+
+`Traffic`: A measure of traffic density in vehicle-kilometers per hour per road length, within 150 meters of the census tract boundary. A higher `Traffic` value indicates the presence of more traffic
+
+`Asthma`: Age-adjusted rate of emergency department visits for asthma
 
 ```{r 2.1response}
 
 ```
 
 ### 2.2
 
-Start with `ER` again instead of the dataset you just made. Subset the data to only include the `rate` column and the columns that end with "cl". Hint: use  `select()` and `ends_with()`. Assign this subset of the data to be `ER2`. Again take a look at the data and check the dimensions.
+Start with `ces` again instead of the dataset you just made. Subset the data to only include the `CensusTract` column and the columns that end with "Pctl". Hint: use  `select()` and `ends_with()`. Assign this subset of the data to be `ces2`. Again take a look at the data and check the dimensions.
+
+"Pctl" stands for "percentile". 
 
 ```{r 2.2response}
 
 ```
 
 ### 2.3
 
-Pull the variable `rate` from `ER2`. How does this differ form selecting it? Use head() to take a look at both options.
+Pull the variable `Asthma` from `ces_sub`. How does this differ from selecting it? Use head() to take a look at both options.
 
 ```{r 2.3response}
 
 ```
 
 ### 2.4
 
-Subset the rows of `ER2` that have **more** than 10 for rate - how many rows are there? Use `filter()`.
+Subset the rows of `ces_sub` that have **more** than 100 for `Asthma` - how many rows are there? Use `filter()`.
 
 ```{r 2.4response}
 
 ```
 
 ### 2.5
 
-Subset the rows of `ER` that have a year value **less** than 2012 and **more** than 10 rate - how many are there?
+Subset the rows of `ces_sub` that have a `Traffic` value **less** than 500 and an `Asthma` value **more** than 100  - how many are there?
+
 
 ```{r 2.5response}
 
 ```
 
 ### 2.6
 
-Subset the rows of `ER` that have a year value of **less than or equal to ** 2012 and **more** than 10 rate - how many are there?
+Subset the rows of `ces_sub` that have a `Traffic` value **less than or equal to**  500 and an `Asthma` value **more** than 100  - how many are there?
 
 ```{r 2.6response}
 
 ```
 
 ### 2.7
 
-Why do the answers for 2.5 and 2.6 differ?
+We used two different criteria for subsetting in 2.5 and 2.6. Why are the number of rows the same for 2.5 and 2.6?
 
 ```{r 2.7response}
 
@@ -151,7 +154,8 @@ Why do the answers for 2.5 and 2.6 differ?
 
 ### P.2
 
-Subset the rows of `ER` for rows that have `county` of `Denver`,  **or** **less** than 4 `rate``.
+Subset the rows of `ces` for rows that have `CA_county` of "Los Angeles",  **or** a `Traffic` value **less** than 300.
+
 How many rows have both?
 
 ```{r P.2response}
@@ -160,7 +164,7 @@ How many rows have both?
 
 ### P.3
 
-Select the variables that contain the letter "a" from `ER`.
+Select the variables that contain the letter "a" from `ces`. Remember, the variables are the column names.
 
 ```{r P.3response}
 
@@ -171,8 +175,9 @@ Select the variables that contain the letter "a" from `ER`.
 
 ### 3.1
 
-Create a subset called `ER_2012` from `ER` that only contains the rows for the year 2012 and only the columns: `county` and	`rate`. `year` should not be included in `ER_sub`.
-	What are the dimensions of this dataset? Don't use pipes (`%>%`) and instead do this in two steps creating the `ER_2012` object with `filter` and updating it with `select`.
+Create a subset called `ces_Alameda` from `ces` that only contains the rows for Alameda and only the columns: `Traffic` and	`Asthma`. `CA_county` should not be included in `ces_Alameda`.
+
+What are the dimensions of this dataset? Don't use pipes (`%>%`) and instead do this in two steps creating the `ces_Alameda` object with `filter` and updating it with `select`.
 
 ```{r 3.1response}
 
@@ -192,17 +197,17 @@ What happens if you do the steps in a different order? Why does this not work?
 
 ```
 
-### 1.3
+### 3.3
 
-Re-order the rows of `ER_2012` by population in increasing order. Use `arrange()`. What is county with the smallest rate?
+Re-order the rows of `ces_Alameda` by `Traffic` value in increasing order. Use `arrange()`. What's the smallest value?
 
 ```{r 3.3response}
 
 ```
 
-### 1.4
+### 3.4
 
-Create a new variable in `ER_2012` called `rate1000`, which  is equal to `rate` divided by 1000, using `mutate()`(don't forget to reassign `ER_2012`). Use pipes `%>%`.
+Create a new variable in `ces_Alameda` called `Asthma100`, which  is equal to `Asthma` divided by 100, using `mutate()`(don't forget to reassign `ces_Alameda`). Use pipes `%>%`. Take a look at the data now!
 
 ```
 # General format
@@ -218,17 +223,17 @@ NEWDATA <- OLD_DATA %>% mutate(NEW_COLUMN = OLD_COLUMN)
 
 ### P.4
 
-Move the `rate1000` column to be before `county` in the `ER_2012` dataset. Use `relocate()`.
+Move the `Asthma100` column to be before `Traffic` in the `ces_Alameda` dataset. Use `relocate()`.
 
 ```{r P.4response}
 
 ```
 
 ### P.5
 
-How can you find the value of `rate` in 2020 for Statewide for Females - using the initial ER data -  without just looking at the data manually and instead use functions we learned today?
+Using the original `ces` data, how can you find the values of `ApproxLocation` for areas within zip code 90745 (in Los Angeles county) that also have a CES4.0 score in the range 90-95% - without just looking at the data manually and instead use functions we learned today? (Hint: It can be helpful to look at your data first)
 
-Note that gender was recorded as binary, which we know isn’t really accurate. This is something you might encounter. Please see this article about ways to measure gender in a more inclusive way: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6526522/. 
+`CES4.0PercRange`: Percentile of the CalEnviroScreen score, grouped by 5% increments. The CalEnviroScreen score is a measure of the negative environmental effects seen in a given region. Those zip codes that have a percentile range of 90-95% are those regions that experience the highest effects of pollution in California.
 
 ```{r P.5response}
 

diff --git a/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.html b/modules/Subsetting_Data_in_R/lab/Subsetting_Data_in_R_Lab.html