forked from Joscelinrocha/R_Intro_Tutorial_BrainHackDC_2020
-
Notifications
You must be signed in to change notification settings - Fork 2
/
04-data-visualization-solutions.Rmd
199 lines (130 loc) · 4.18 KB
/
04-data-visualization-solutions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: "Data Visualization Solutions"
author: "Joscelin Rocha Hidalgo"
output:
html_document:
css: slides/style.css
toc: true
toc_depth: 1
toc_float: true
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Load Packages
Let's load the `tidyverse` package + `ggplot2`.
```{r}
library(tidyverse)
library(ggplot2)
```
# Import chds6162_data
```{r}
data <- read_csv("data/chds6162_data.csv")
#then run this:
data <- data %>%
mutate(ded_lbls = case_when(
ded == 0 ~ "<8th",
ded == 1 ~ "8th-12th",
ded == 2 ~ "HS degree",
ded == 3 ~ "HS+trade",
ded == 4 ~ "Some college",
ded == 5 ~ "College degree",
ded == 6 ~ "Trade school",
ded == 7 ~ "HS unclear"))
```
# Scatterplot
Make a scatterplot that shows dad weight on the x axis and dad height on the y axis.
```{r}
ggplot(data,aes(dwt,dht)) + geom_point()
```
# Histogram
Make a histogram that shows the distribution of the fathers' weight variable (`dwt`).
```{r}
ggplot(data = data,
mapping = aes(x = dwt)) +
geom_histogram()
```
Now with try with 50 bins.
```{r}
ggplot(data = data,
mapping = aes(x = dwt)) +
geom_histogram(bins = 50)
```
# Bar Chart
## Bar Chart for specific values
I created a new dataframe`dad_hgt_by_ed` that shows the average amount of height in inches that males reported based on their education level.
```{r}
dad_hgt_by_ed <- data %>%
group_by(ded_lbls) %>%
summarize(avg_ht = mean(dht, na.rm = TRUE))
```
Plot the average height for fathers for the different education levels (`ded_lbls`). Remember if NAs are being plotted, you can drop them! (hint `drop_NA`)
```{r}
ggplot(data = dad_hgt_by_ed,
mapping = aes(x = ded_lbls,
y = avg_ht)) +
geom_bar(stat = "identity")
```
# `color` and `fill`
Take your graph from above and make the inside of each bar a different color.
```{r}
ggplot(data = dad_hgt_by_ed,
mapping = aes(x = ded_lbls,
y = avg_ht,
fill = ded_lbls)) +
geom_bar(stat = "identity")
```
# Scales
## color
Take the scatterplot you made earlier: make the points different colors based on education (`ded`) and add a scale using `scale_color_viridis_d`.
```{r}
ggplot(data,aes(dwt,dht, color = ded_lbls)) + geom_point() + scale_color_viridis_d(option = "inferno")
#You can also do it manually:
ggplot(data = data,
mapping = aes(x = dwt,
y = dht,
color = ded_lbls)) +
geom_point() +
scale_color_manual(values = c("orange", "black","green","blue","pink","#1F271B","brown","purple"))
```
# Plot Labels
Use the code chunk from above and do the following:
1. Add a title
2. Add a better x & y axis label
```{r}
ggplot(data,aes(dwt,dht, color = ded_lbls)) +
geom_point() +
scale_color_viridis_d(option = "inferno") +
labs (title = "Fathers' weight vs height based on education level",
x = "Weight (pounds)",
y = "Height (inches)",
color = "Education levels")
```
# Themes
Use the last plot and add the `theme_classic` to your plot.
```{r}
ggplot(data,aes(dwt,dht, color = ded_lbls)) +
geom_point() + scale_color_viridis_d(option = "inferno") +
labs (title = "Fathers' weight vs height based on education level",
x = "Weight (pounds)",
y = "Height (inches)",
color = "Education levels") +
theme_classic()
ggsave("plots/my-chunk-plot.png",width = 10, height = 5)
```
# Facets
Instead of looking at the plot the way you did earlier, your boss wants you to create multiple plots (one for each education level). How can you do it? (hint: `facet_wrap(~XX)`)
```{r}
ggplot(data,aes(dwt,dht, color = ded_lbls)) +
geom_point() +
scale_color_viridis_d(option = "inferno") +
labs (title = "Fathers' weight vs height based on education level", x = "Weight (pounds)", y = "Height (inches)", color = "Education levels") +
theme_classic() +
facet_wrap(~ded_lbls)
```
# Save Plots
Save your last plot to a PNG that is 10 inches wide and 5 inches high. Put it in the plots directory and call it "my-fav-plot.png"
```{r}
ggsave("plots/my-fav-plot.png",width = 10, height = 5)
```