forked from jtr13/cc22mw
-
Notifications
You must be signed in to change notification settings - Fork 1
/
interactive_plots_in_r.Rmd
216 lines (152 loc) · 10.3 KB
/
interactive_plots_in_r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# Interactive plots for different types of graphs in R
Tongni Chen and Tao Yu
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```
```{r}
library(dplyr)
library(ggplot2)
library(plotly)
library(gapminder)
library("heatmaply")
library(dygraphs)
library(treemap)
library(collapsibleTree)
library(networkD3)
```
## introduction
Data visualization is useful and also necessary to get information from large data frames. We learn and use various packages in this class to visualize data. After that, we think of can we have more readable graphs, can we have graphs to show more information, like detailed numeric and text information, if we would like to do a presentation to others. For this purpose, we would like to introduce interactive plots in R.
We will introduce 5 packages in R which can create interactive plots.
The first one is Plotly, a powerful package to make various types of interactive graphs, including line plots, histograms, bar charts, etc.
After that, we would like to introduce other 4 packages which focus on one specific type of graph. You can probably have more options and flexiblility to create a desired type of graphs with these packages:
heatmaply: create interactive heatmaps;
dygraphs: focus on change of variables with respect to time;
treemap: create interactive tree graphs;
networkD3: create interactive network graphs.
## Plotly
Plotly is a powerful R package to make interactive graphs, including line plots, scatter plots, box plots, histograms, etc. In the following section, we will introduce 3 examples.
### Scatter plot
To produce interactive scatter plots, we can first use ggplot2, same as how we learned from class, then just call one function to make it interactive.
```{r}
scatter <- gapminder %>%
filter(year==1977) %>%
ggplot(aes(gdpPercap, lifeExp,color=continent)) +
geom_point()
ggplotly(scatter)
```
By using plotly, we can check information of each dot much more easily. As we can simply click the specific dots we are interested, there will appear detailed text and numeric information of that dot, like GDP, life expectancy, continent in this example.
### Boxplots
Besides just calling one function, Plotly also has its own functions to make graphs. Here is the example of how to draw interactive boxplots by Plotly without ggplot.
```{r}
data1 = ~rexp(3)
data2 = ~rnorm(10,1)
box <- plot_ly(y=data1,
type = "box")
box <- box %>% add_trace(y = data2)
box
```
This example shows how to use functions from plotly to create boxplots on two samples from Exponential distribution with parameter = 3 and Normal distribution with mean = 10, sd = 1.
Interactive boxplots by plotly can show detailed numerical maximum, minimum, median, Q1, Q3 values when you move mouse onto the specific box, which is easier for us to compare between different samples.
### Histograms
```{r}
data3 = rnorm(30,5)
hist <- plot_ly(x = data3,
type = "histogram",
nbinsx = 5,
marker = list(color = "lightgray",
line = list(color = "darkgray",width = 2)))%>%
layout(title = "Frequency distribution of Normal(30,5)",
yaxis = list(title = "value"),
xaxis = list(title = "frequency"))
hist
```
The above example shows how to use plotly to create an interactive histogram graph with desired bin size and other customized layout, such as color, title, etc.
Interactive histogram graphs by plotly can show range of each bin and counts of each bin when you move your mouse onto specific bins.
Plotly can do many other types of interactive graphs, like bar charts, pie charts, line plots, etc.
For more information, here is the library for Plotly R package: https://plotly.com/r/.
## Specialized package for specific types of graphs
Plotly creates various types of graphs as described before. After that, we would like to introduce some packages which focus on a specific type of graph. You can probably have more options and flexiblility to create a desired type of graphs with these packages.
## Heatmaply
This is a package to create interactive heatmaps. We will introduce some examples by Heatmaply in this section.
The example uses mtcars, a dataframe in R about information of cars with different brands.
The column of dataframe is for each part of car, like gear, carb.
The row of dataframe is for different brands.
```{r}
heatmaply(normalize(mtcars))
```
With no assignment on the x and y features, its default is to show heatmap between all columns and rows in the dataframe. When you move your mouse on to one specific rectangle, there will appear which car brand is this rectangle from and which part of car is this rectangle for.
We can also assign our desired x and y features. For example, we are interested in the correlations between different parts of cars.
```{r}
heatmaply_cor(
cor(mtcars),
xlab = "Features",
ylab = "Features"
)
```
heatmaply_cor is a useful funtion to create correlation matrixs.
The example shows how to create a heatmap between each column in mtcars.
## Time Series
Time series is a type of data that are indexed by time order, and is often used to represent change of variable with respect to time, a package that supports interactive time series graph is **dygraphs**:
What dygraphs do is that it automatically shows the detailed numerical data on the upper right hand side that follows your mouse cursor. You can also drag your mouse through a range and zoom in to that part of the graph, you can also double click on the graph to zoom back to the original full graph. The following is a demo, you can achieve all of these by simply feeding into the package your data, and it automatically generates these functions for you. However notice that this package only supports data that are in xts format.(or data that can be converted to xts format) I used quantmod here for some sample time series data, notice that there's not much data conversion needed. Also this package supports drawing multiple column in the same graph.
```{r}
library(quantmod)
getSymbols("MSFT",from="2019-01-01")
MSFT[1:3,1]
dygraph(MSFT[,1])
dygraph(MSFT[,1:3])
```
One can also add a range selector that helps with selecting the range by passing the generated dygraph object to dyRangeSelector()
```{r}
dygraph(MSFT[,1]) %>% dyRangeSelector()
```
## Tree
Trees are often used to represent parent child relations, this type of graph is useful in representing for example family lineage, job specialization, and various data stored in computers that uses binary or n-ary tree. One problem with tree is that they could be huge after expanding all of them, a package that deals with this question is the **collapsibleTree**, it creates an interactive tree graph that unfolds the child node when one clicks on the parent node, and retrieves when clicked again. Again, a tree is a specific type of data structure and it requires some data conversion before a diagram can be created. One most common is to convert a data from data frame, and we will give an example for that.
First we retrieve some data that has a tree structure.
```{r}
data(GNI2014)
head(GNI2014)
```
We can see that this data has countries from continent which belongs to earth, so we can create a level 3 tree by Earth-\>continent-\>country. To create a collapsible tree, we only need to specify the data and the hierarchies (that appears in the data frame as column name), in this case are continent, country. We can also set the root node name, set weather the graph is zoomable or not.
```{r}
collapsibleTree(
GNI2014,
hierarchy = c("continent","country"),
root="Earth",
zoomable=FALSE
)
```
We can also summarize numerical value for each node based on the total count of leaf descendents by using collapsibleTreeSummary and setting attribute="column_name" for the desired attribute, there are also aesthetic settings for this package though I will not discuss in detail here.
```{r}
library(collapsibleTree)
collapsibleTreeSummary(
GNI2014,
hierarchy = c("continent","country"),
root="Earth",
attribute="population",
zoomable=FALSE
)
```
## Graph(Network Graph)
There's also a cool package that draws 3d graphs, where graph here stands for the usual graph we learn in a math class that has nodes and edges. Since these graphs sometimes has multiple edges connecting a few nodes, they could be hard to drawn in a 2D plain, package networkD3 provides an interactive graph drawing for this type of graph. The input data for it is also simple, it requires a table with three columns, the first two are the two nodes connected by an edge and the third column is a edge value, lets create one simple data frame that can be used by it.
```{r}
Source=c("A","A","B","B","C","D","D","D","E","F")
Target=c("B","C","F","D","B","E","F","C","A","C")
edge_value=c(1,2,3,4,5,6,7,8,9,10)
example_graph<-data.frame(Source,Target,edge_value)
simpleNetwork(
example_graph
)
```
Where various interesting settings can be made on the graph, for example: charge determines how strong nodes are attracted/repulsed, fontSize set the font size of node names, opacity for the opacity of the graph, linkDistance for the length of edges. color settings for nodes and edges by nodeColour and linkColour...
```{r}
simpleNetwork(
example_graph,
charge=20,
fontSize=30,
linkDistance = 200,
opacity=0.7
)
```
## Evaluation
Through completing the community contribution, we learned the basic functionality of various interactive data visualization packages that specializes on different types of graphs, including plotly for most common histogram/boxplot/scatterplot, heatmaply for heat map, dygraphs for time series analysis, collapsible tree for tree diagrams and networkD3 for (network)graphs. We initially intended to work on one specific package but soon realized that most convenient packages are all well documented and can easily find tutorials online, so we decided to introduce various packages and show what they can do in terms of enhancing data visualization and we believe this will be helpful for students who are looking for good packages targeted on specific types of graphs. If time allows, we are willing to look for and introduce more types of graphs besides the current ones, and might introduce more packages for sophisticated graphs. We can also discuss these packages more in detail to give a better explaination on their full capability.