Render site

fhdsl · Oct 2, 2024 · 6a36ef5 · 6a36ef5
1 parent ed45e6c
commit 6a36ef5
Show file tree

Hide file tree

Showing 6 changed files with 630 additions and 3,969 deletions.
diff --git a/help.html b/help.html
@@ -353,14 +353,14 @@ <h2><strong>Why are my changes not taking effect? It’s making my results look
 <p>Here we are creating a new object from an existing one:</p>
 <pre class="r"><code>new_rivers &lt;- sample(rivers, 5)
 new_rivers</code></pre>
-<pre><code>## [1] 625 411 250 490 270</code></pre>
+<pre><code>## [1] 600 671 260 210 760</code></pre>
 <p>Using just this will only print the result and not actually change <code>new_rivers</code>:</p>
 <pre class="r"><code>new_rivers + 1</code></pre>
-<pre><code>## [1] 626 412 251 491 271</code></pre>
+<pre><code>## [1] 601 672 261 211 761</code></pre>
 <p>If we want to modify <code>new_rivers</code> and save that modified version, then we need to reassign <code>new_rivers</code> like so:</p>
 <pre class="r"><code>new_rivers &lt;- new_rivers + 1
 new_rivers</code></pre>
-<pre><code>## [1] 626 412 251 491 271</code></pre>
+<pre><code>## [1] 601 672 261 211 761</code></pre>
 <p>If we forget to reassign this can cause subsequent steps to not work as expected because we will not be working with the data that has been modified.</p>
 <hr />
 </div>
@@ -409,7 +409,7 @@ <h2><strong>Error: object ‘X’ not found</strong></h2>
 <p>Make sure you run something like this, with the <code>&lt;-</code> operator:</p>
 <pre class="r"><code>rivers2 &lt;- new_rivers + 1
 rivers2</code></pre>
-<pre><code>## [1] 627 413 252 492 272</code></pre>
+<pre><code>## [1] 602 673 262 212 762</code></pre>
 <hr />
 </div>
 <div id="error-unexpected-in-error-unexpected-in-error-unexpected-x-in" class="section level2">

diff --git a/index.html b/index.html
@@ -351,7 +351,7 @@ <h2>Testimonials from our other courses:</h2>
 <h2>Find an Error!?</h2>
 <hr />
 <p>Feel free to submit typos/errors/etc via the GitHub repository associated with the class: <a href="https://github.com/fhdsl/DaSEH" class="uri">https://github.com/fhdsl/DaSEH</a></p>
-<p>This page was last updated on 2024-10-01.</p>
+<p>This page was last updated on 2024-10-02.</p>
 <p style="text-align:center;">
 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://live.staticflickr.com/4557/26350808799_6f9c8bcaa2_b.jpg" height="150"/> </a>
 </p>

diff --git a/modules/Data_Summarization/Data_Summarization.html b/modules/Data_Summarization/Data_Summarization.html
diff --git a/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd b/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
@@ -11,45 +11,38 @@ knitr::opts_chunk$set(echo = TRUE)
 
 # Part 1
 
-Data used
-
-CalEnviroScreen Dataset: CalEnviroScreen is a project that ranks census tracts in California based on potential exposures to pollutants, adverse environmental conditions, socioeconomic factors and the prevalence of certain health conditions. Data used in the CalEnviroScreen model come from national and state sources.
-
-The data is from https://calenviroscreen-oehha.hub.arcgis.com/#Data
-
-You can Download as a CSV in your current working directory.  Note its also available at: 	https://daseh.org/data/CalEnviroScreen_data.csv 
+We'll again use the CalEnviroScreen dataset for the lab. Load the `tidyverse` package and the dataset, which can be found at 	https://daseh.org/data/CalEnviroScreen_data.csv. Name the dataset `ces`.
 
 
 ```{r, echo = TRUE, message=FALSE, error = FALSE}
 library(tidyverse)
-library(dasehr)
-```
 
-```{r}
-ces <- calenviroscreen
-# Or use
-# ces <- read_csv(file = "https://daseh.org/data/CalEnviroScreen_data.csv")
+ces <- read_csv(file = "https://daseh.org/data/CalEnviroScreen_data.csv")
 ```
 
 ### 1.1 
 
-How observations/rows are in the `ces` data set? You can use `dim()` or `nrow()` or examine the Environment.
+How many observations/rows are in the `ces` data set? You can use `dim()` or `nrow()` or examine the Environment.
 
 ```{r 1.1response}
 
 ```
 
 ### 1.2
 
-What was the population of California in the 2010 census, based on the `TotalPop` column? (use `sum()`)
+The `TotalPop` column includes information about the population for each census tract as of the 2010 census.
+
+NOTE: A census tract a small, relatively permanent area within a county used to present data from the census. Each row in the `ces` dataset corresponds to a single census tract. See https://www2.census.gov/geo/pdfs/education/CensusTracts.pdf
+
+What was the total population in the dataset based on the 2010 census? (use `sum()` and the `TotalPop` column)
 
 ```{r 1.2response}
 
 ```
 
 ### 1.3
 
-What is the largest (`max`) total population (`TotalPop`) among all census tracts (rows)? Use `summarize`.
+What was the largest population, according to the 2010 census, for a single census tract (row)? Use `summarize` and `max`.
 
 ```
 # General format 
@@ -63,7 +56,7 @@ DATA_TIBBLE %>%
 
 ### 1.4
 
-Modify your code from 1.3 to add the `min` of `TotalPop` using the `summarize` function.
+Modify your code from 1.3 to add the smallest population among census tracts. Use `min` in your `summarize` function.
 
 ```
 # General format 
@@ -82,7 +75,9 @@ DATA_TIBBLE %>%
 
 ### P.1
 
-Summarize the `ces` data to get the mean of `TotalPop` and `Pesticides`. Make sure to remove `NA`s.
+Summarize the `ces` data to get the mean of both the `TotalPop` and `Pesticides` columns. Make sure to remove `NA`s.
+
+`Pesticides`: Total pounds of selected active pesticide ingredients  used in production-agriculture per square mile. The higher the number, the greater the amount of pesticides have been used on agricultural sites
 
 ```
 # General format 
@@ -106,9 +101,7 @@ Given that parts of California are heavily agricultural, and the max value for t
 
 ### P.3
 
-Filter any zeros out of `ces` `Pesticides`. Use `filter()`. Assign this "cleaned" dataset object the name `exurban_ces``.
-
-(We are making the admittedly shaky assumption that places with no reported pesticide use are within cities.)
+Filter any zeros from the `Pesticides` column out of `ces`. Use `filter()`. Assign this "cleaned" dataset object the name `ces_pest`.
 
 ```
 # General format 
@@ -130,23 +123,21 @@ How many census tracts have pesticide values greater than 0?
 
 ### 2.1
 
-The variable `CES4.0PercRange` categorizes the calculated CES4.0 value (a measure of the pollution burden in a particular region) into percentile ranges, grouped by 5% increments.
-
-How many census tracts are there in each percentile range? Use `count()` on the column named `CES4.0PercRange`. Use `ces` as your input data.
+How many census tracts are present in each California county? Use `count()` on the column named `CaliforniaCounty`. Use `ces` as your input data.
 
 ```{r 2.1response}
 
 ```
 
 ### 2.2
 
-Modify your code from question 2.1 to break down each percentile range by California county. Use `count()` on the columns named `CES4.0PercRange` and `CaliforniaCounty`.
+Let's break down the count further. Modify your code from question 2.1 to count census tracts by County AND ZIP code. Use `count()` on the columns named `CaliforniaCounty` and `ZIP`.
 
 ```{r 2.2response}
 
 ```
 
-Hmm. This isn't the easiest table to read. Let's try a different approach.
+This isn't the only way we can create this table in R. Let's look at another way to build it.
 
 ### 2.3
 
@@ -165,7 +156,7 @@ DATA_TIBBLE %>%
 
 ### 2.4
 
-Modify your code from 2.3 to also group by `CES4.0PercRange`.
+Modify your code from 2.3 to also group by `ZIP`.
 
 ```{r 2.4response}
 
@@ -176,7 +167,7 @@ Modify your code from 2.3 to also group by `CES4.0PercRange`.
 
 ### P.4
 
-Modify code from 2.3 to also summarize by total population per group. In your summarized output, make sure you call the new summarized average total population variable (column name) "mean". In other words, the head of your output should look like:
+Modify code from 2.3 (the code that only groups by county) to also summarize by total population (`TotalPop`) per group. In your summarized output, make sure you call the new summarized average total population variable (column name) "mean". In other words, the head of your output should look like:
 
 ```
 # A tibble: 58 × 3
@@ -185,6 +176,7 @@ Modify code from 2.3 to also summarize by total population per group. In your su
  1 "Alameda "         360 4602.
 ...
 ```
+(In the above table, remember that the "count" column is counting the number of census tracts.)
 
 ```{r P.4response}
 

diff --git a/modules/Data_Summarization/lab/Data_Summarization_Lab_Key.html b/modules/Data_Summarization/lab/Data_Summarization_Lab_Key.html
diff --git a/modules/cheatsheets/Day-4.pdf b/modules/cheatsheets/Day-4.pdf