-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lily McMullen Assignment 3.Rmd
115 lines (74 loc) · 3.69 KB
/
Lily McMullen Assignment 3.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: "Module 1 Assignment 3: Getting to Know your Home"
author: "Ellen Bledsoe" 'Lily McMullen'
date: "`r Sys.Date()`" # <- uses the current date when rendered
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Assignment Description
### Purpose
The goal of this assignment is to get comfortable using the `tidyverse` with 2-dimensional data sets and compare this process to using base R.
### Task
Write R code using the `tidyverse` to successfully answer each question below.
### Criteria for Success
- Code is within the provided code chunks
- Code is commented with brief descriptions of what the code does
- Code chunks run without errors
- Code produces the correct result
- This is the one time I *will* take points off for not using `tidyverse`...
### Due Date
Sept 15 at midnight MDT
# Assignment Questions
For this final assignment for Module 1, you'll be working with another real-world data set--a collection of data from climate stations scattered across Antarctica.
1. In your own words, describe what the `tidyverse` is. Your answer should be between 1-3 sentences.
Tidyverse is a library in R that allows us to handle data more easily. The tidyverse library contains many packages that were designed to make data science in R easier.
1. Load in the `tidyverse` package.
```{r load_tidyverse}
library(tidyverse)
```
3. Load in the data file (called aggregated_station_data.csv). Save the data as an object called `weather`.
```{r load_data}
weather <- aggregated_station_data
```
4. Take a look at the data in whichever way you would like (e.g., `glimpse()`, `slice()`, `str()`, `head()`, etc.). How many rows and columns are in the data? Type your answers below:
rows:139,160\
columns:12
5. Create a data frame that includes temperatures which are above freezing (AKA greater than 0)
```{r above0}
weather %>%
filter(temp > 0)
```
6. Create a new data frame that includes *only* the following columns: year, day, month, temp, station_id. Save this new data frame as an object called `temp`.
```{r temp_df}
temp <- weather[, c("year", "day", "month", "temp", "station_id")]
```
7. Using the data frame you created in Q5 above (`temp`), add a new column to that data frame that converts the temperature column (currently in Celsius) to Fahrenheit. Call the new column `tempF`. (Hint: we did this in class--use that same equation)
```{r tempF}
temp %>%
mutate(tempF = temp * (9/5) + 32)
```
8. In your own words (either bullet points or sentence form is fine), explain two benefits of using the pipe (`%>%`).
Using the pipe is, in my opinion, the easiest way to forward a value into the next thing you're doing. The pipe mostly helps you connect your commands.
9. Find the minimum temperature recorded for each month (in Celsius, the original column). (Hint: think about months first (split) and then temperature (apply). You will also want to remove all the NA values.)
```{r}
weather %>%
group_by(month) %>%
summarize(min_temp = min(temp, na.rm = TRUE))
```
10. Create a data frame with the mean temperature for the month of January for each station.
Some hints:
- take note of how months are represented in the data
- think about using the pipe, how we choose which rows we want, and how we split-apply-combine
- remember to remove the NA values!
```{r mean_jan_temp}
weather %>%
filter(month == 1) %>%
group_by(station_id) %>%
summarize(mean_temp = mean(temp, na.rm = TRUE))
```
## Bonus! (up to 2 points)
Write code to determine how many unique stations are in the `weather` data set. (Hint: look up the help file for the `distinct()` and the `count()` functions).
```{r unique_stations}
```