-
Notifications
You must be signed in to change notification settings - Fork 4
/
05-visualization.Rmd
2373 lines (1887 loc) · 94 KB
/
05-visualization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Network Visualization{#Visualization}
![](images/image_break.png){width=100%}
This section follows along with Brughmans and Peeples (2023) chapter 6 to illustrate the wide variety of techniques which can be used for network visualization. We begin with some general examples of network plotting and then demonstrate how to replicate all of the specific examples that appear in the book. For most of the examples below we rely on R but in a few cases we use other software and provide additional details and data formats.
There are already some excellent resources online for learning how to create beautiful and informative network visuals. We recommend the excellent online materials produced by Dr. Katherine Ognyanova [available on her website](https://kateto.net/) and her [Static and dynamic network visualization with R](https://kateto.net/network-visualization) workshop materials in particular. Many of the examples here and in the book take inspiration from her work. In addition to this, the [R Graph Gallery](https://www.r-graph-gallery.com/) website created by [Holtz Yan](https://github.com/holtzy) provides numerous excellent examples of plots in R using the `ggplot2` and `ggraph` packages among many others. If you are new to R, it will probably be helpful for you to read a bit about basic graphic functions (including in the tutorials listed here) before getting started.
## Data and R Setup{#VizDatasets}
In order to make it as easy as possible for users to replicate specific visuals from the book and the other examples in this tutorial we have tried to make the examples as modular as possible. This means that we provide calls to initialize the required libraries for each plot within each relevant chunk of code (so that you can more easily tell what package does what) and we also provide links to download the data required to replicate each figure in the description of that figure below. The data sets we use here include both .csv and other format files as well as .Rdata files that contain sets of specific R objects formatted as required for individual chunks of code.
If you plan on working through this entire tutorial and would like to download all of the associated data at once [you can download this zip file](All_data.zip). Simply extract this zip folder into your R working directory and the examples below will then work. Note that all of the examples below are setup such that the data should be contained in a sub-folder of your working directory called "data" (note that directories and file names are case sensitive).
## Visualizing Networks in R{#ViZInR}
```{block, type="rmdnote"}
There are many tools available for creating network visualizations in R including functions built directly into the `igraph` and `statnet` packages. Before we get into the details, we first briefly illustrate the primary network plotting options for `igraph`, `statnet` and a visualization package called `ggraph`. We start here by initializing our required libraries and reading in an adjacency matrix and creating network objects in both the `igraph` and `statnet` format. These will be the basis for all examples in this section.
```
Let's start by reading in our example data and then we describe each package in turn:
```{r Chapter6_read_data, warning=F, message=F}
library(igraph)
library(statnet)
library(ggraph)
library(intergraph)
cibola <-
read.csv(file = "data/Cibola_adj.csv",
header = TRUE,
row.names = 1)
cibola_attr <- read.csv(file = "data/Cibola_attr.csv", header = TRUE)
# Create network in igraph format
cibola_i <- igraph::graph_from_adjacency_matrix(as.matrix(cibola),
mode = "undirected")
cibola_i
# Create network object in statnet/network format
cibola_n <- asNetwork(cibola_i)
cibola_n
```
### `network` package{#networkpackage}
All you need to do to plot a `network/statnet` network object is to simply type `plot(nameofnetwork)`. By default, this creates a network plot where all nodes and edges are shown the same color and weight using the Fruchterman-Reingold graph layout by default. There are, however, many options that can be altered for this basic plot. In order to see the details you can type `?plot.network` at the console for the associated document.
```{r Fig_net_simple}
set.seed(6332)
plot(cibola_n)
```
In order to change the color of nodes, the layout, symbols, or any other features, you can add arguments as detailed in the help document. These arguments can include calls to other functions, mathematical expressions, or even additional data in other attribute files. For example in the following plot, we calculate degree centrality directly within the plot call and then divide the result by 10 to ensure that the nodes are a reasonable size in the plot. We use the `vertex.cex` argument to set node size based on the results of that expression. Further we change the layout using the "mode" argument to produce a network graph using the Kamada-Kawai layout. We change the color of the nodes so that they represent the `Region` variable in the associated attribute file using the `vertex.col` argument and and set change all edge colors using the `edge.col` argument. Finally, we use `displayisolates = FALSE` to indicate that we do not want the single isolated node to be plotted. These are but a few of the many options.
```{r Fig_network_net}
set.seed(436)
plot(
cibola_n,
vertex.cex = sna::degree(cibola_n) / 10,
mode = "kamadakawai",
vertex.col = as.factor(cibola_attr$Region),
edge.col = "darkgray",
displayisolates = FALSE
)
```
### `igraph` package{#igraphpackage}
The `igraph` package also has a built in plotting function called `plot.igraph`. To call this you again just need to type `plot(yournetworkhere)` and provide an igraph object (R can tell what kind of object you have if you simply type plot). The default igraph plot again uses a Fruchterman-Reingold layout just like `statnet/network` but by default each node is labeled.
```{r Fig_igraph_simple}
set.seed(435)
plot(cibola_i)
```
Let"s take a look at a few of the options we can alter to change this plot. There are again many options to explore here and the help documents for igraph.plotting describe them in detail (type ?igraph.plotting at the console for more). If you want to explore `igraph` further, we suggest you check the [Network Visualization](https://kateto.net/network-visualization) tutorial linked above which provides a discussion of the wide variety of options.
```{r Fig_igraph}
set.seed(3463)
plot(
cibola_i,
vertex.size = igraph::eigen_centrality(cibola_i)$vector * 20,
layout = layout_with_kk,
vertex.color = as.factor(cibola_attr$Great.Kiva),
edge.color = "darkblue",
vertex.frame.color = "red",
vertex.label = NA
)
```
### `ggraph` package{#ggraphpackage}
The `ggraph` package provides a powerful set of tools for plotting and visualizing network data in R. The format used for this package is a bit different from what we saw above and instead relies on the `ggplot2` style of plots where a plot type is called and modifications are made with sets of lines with additional arguments separated by `+`. Although this takes a bit of getting used to we have found that the ggplot format is often more intuitive for making complex graphics once you understand the basics.
Essentially, the way the `ggraph` call works is you start with a `ggraph` function call which includes the network object and the layout information. You then provide lines specifying the edges `geom_edge_link` and nodes `geom_node_point` features and so on. Conveniently the `ggraph` function call will take either an `igraph` or a `network` object so you do not need to convert.
Here is an example. Here we first the call for the igraph network object `Cibola_i` and specify the Fruchterman-Reingold layout using `layout = "fr"`. Next, we call the `geom_edge_link` and specify edge colors. The `geom_node_point` call then specifies many attributes of the nodes including the fill color, outline color, transparency (alpha), shape, and size using the `igraph::degree` function. The `scale_size` call then tells the plot to scale the node size specified in the previous line to range between 1 and 4. Finally `theme_graph` is a basic call to the `ggraph` theme that tells the plot to make the background white and to remove the margins around the edge of the plot. Let's see how this looks.
In the next section we go over the most common options in `ggraph` in detail.
```{r Fig_ggraph}
set.seed(4368)
# Specify network to use and layout
ggraph(cibola_i, layout = "fr") +
# Specify edge features
geom_edge_link(color = "darkgray") +
# Specify node features
geom_node_point(
fill = "blue",
color = "red",
alpha = 0.5,
shape = 22,
aes(size = igraph::degree(cibola_i)),
show.legend = FALSE
) +
# Set the upper and lower limit of the "size" variable
scale_size(range = c(1, 10)) +
# Set the theme "theme_graph" is the default theme for networks
theme_graph()
```
There are many options for the `ggraph` package and we recommend exploring the help document (`?ggraph`) as well as the [Data Imaginist](https://www.data-imaginist.com/tags/visualization) `ggraph` tutorial online for more. Most of the examples below will use the `ggraph` format.
## Network Visualization Options{#NetVizOptions}
In this section we illustrate some of the most useful graphical options for visualizing networks, focusing in particular on the `ggraph` format. In most cases there are similar options available in the plotting functions for both `network` and `igraph`. Where relevant we reference specific figures from the book and this tutorial and the code for all of the figures produced in R is presented in the next session. For all of the examples in this section we will use the [Cibola technological similarity data (click here to download)](data/Peeples2018.Rdata). First we call the required packages and import the data.
```{r graph_layout, message=F, warning=FALSE}
library(igraph)
library(statnet)
library(intergraph)
library(ggraph)
load("data/Peeples2018.Rdata")
# Create igraph object for plots below
net <- asIgraph(brnet)
```
### Graph Layout{#GraphLayouts}
Graph layout simply refers to the placement and organization in 2-dimensional or 3-dimensional space of nodes and edges in a network.
#### Manual or User Defined Layouts{#ManualLayouts}
There are a few options for manually defining node placement and graph layout in R and the easiest is to simply provide x and y coordinates directly. In this example, we plot the Cibola technological similarity network with a set of x and y coordinates that group sites in the same region in a grid configuration. For another example of this approach see [Figure 6.1 below](#Figure_6_1). For an example of how you can interactively define a layout see [Figure 6.5](#Figure_6_5)
```{r manual_layout, fig.width=7, fig.height=7}
# site_info - site location and attribute data
# Create xy coordinates grouped by region
xy <-
matrix(
c(1, 1, 3, 3, 2, 1, 2, 1.2, 3, 3.2, 2, 1.4, 1, 1.2, 2, 2.2, 3,
2, 3, 1, 2.2, 1, 2, 3, 2, 3.2, 3, 1.2, 3, 3.4, 1, 2, 3.2, 3.2,
3, 1.4, 3, 2.2, 2, 2, 3.2, 3.4, 2.2, 1.2, 3.4, 3.2, 3.2, 1, 2,
3.4, 3.4, 3.4, 2.2, 3, 2.2, 3.2, 2.2, 3.4, 1, 1.4, 3, 2.4),
nrow = 31,
ncol = 2,
byrow = TRUE
)
# Plot using "manual" layout and specify xy coordinates
ggraph(net,
layout = "manual",
x = xy[, 1],
y = xy[, 2]) +
geom_edge_link(edge_color = "gray") +
geom_node_point(aes(size = 4, col = site_info$Region),
show.legend = FALSE) +
theme_graph()
```
#### Geographic Layouts{#GeographicLayouts}
Plotting networks using a a geographic layout is essentially the same as plotting with a manual layout except that you specify geographic coordinates instead of other coordinates. See [Figure 6.2](#Figure_6_2) for another example.
```{r map1, fig.width=7, fig.height=7}
ggraph(net,
layout = "manual",
x = site_info$x,
y = site_info$y) +
geom_edge_link(edge_color = "gray") +
geom_node_point(aes(size = 4, col = site_info$Region),
show.legend = FALSE) +
theme_graph()
```
When working with geographic data, it is also sometimes useful to plot directly on top of some sort of base map. There are many options for this but one of the most convenient is to use the `sf` and `ggmap` packages to directly download the relevant base map layer and plot directly on top of it. This first requires converting points to latitude and longitude in decimal degrees if they are not already in that format. See the details on the [sf package](https://r-spatial.github.io/sf/) and [ggmap package](https://github.com/dkahle/ggmap) for more details.
Here we demonstrate the use of the `ggmap` and the `get_stadiamap` function which requires a bit of additional explanation. This function automatically retrieves a background map for you using a few arguments:
* **`bbox`** - the bounding box which represents the decimal degrees longitude and latitude coordinates of the lower left and upper right area you wish to map.
* **`maptype`** - a name that indicates the style of map to use ([check here for options](https://rdrr.io/github/dkahle/ggmap/man/get_stadiamap.html)).
* **`zoom`** - a variable denoting the detail or zoom level to be retrieved. Higher number give more detail but take longer to detail.
As of early 2024 the `get_stadiamap` function also requires that you sign up for an account at [stadiamaps.com](https://stadiamaps.com). This account is free and allows you to download a large number of background maps in R per month (likely FAR more than an individual would ever use). There are a few setup steps required to get this to work. You can follow the steps below or [click here for a YouTube video outlining steps 1 thorugh 3 below](https://www.youtube-nocookie.com/embed/6jUSyI6x3xg).
1) First, you need to sign up for a free account at Stadiamaps.
2) Once you sign in, you will be asked to create a Property Name, designating where you will be using data. You can simply call it "R analysis" or anything you'd like.
3) Once you create this property you'll be able to assign an API key to it by clicking the "Add API" button.
4) Now you simply need to let R know your API to allow map download access. In order to do this copy the API key that is visible on the stadiamaps page from the property you created and then run the following line of code adding your actual API key in the place of [YOUR KEY HERE]
```{r, eval=F}
library(ggmap)
activate(key="[YOUR KEY HERE]")
```
Note, for the ease of demonstration, in the remainder of this online guide (other than the code chunk below) we pre-download the maps and provide them as a file instead of using the `get_stadiamap` function.
```{block, type="rmdtip"}
We describe the specifics of spatial data handling, geographic coordinates, and projection in the section on [Spatial Networks](#SpatialNetworks). See that section for a full description and how R deals with geographic information.
```
```{r, echo=F, warning=F}
source("stadia_API.R")
```
```{r geo_layout, warning=F, message=F, fig.heigh=7, fig.width=7, cache=T}
library(sf)
library(ggmap)
# Convert attribute location data to sf coordinates and change
# map projection
locations_sf <-
st_as_sf(site_info, coords = c("x", "y"), crs = 26912)
loc_trans <- st_transform(locations_sf, crs = 4326)
coord1 <- do.call(rbind, st_geometry(loc_trans)) %>%
tibble::as_tibble() %>%
setNames(c("lon", "lat"))
xy <- as.data.frame(coord1)
colnames(xy) <- c("x", "y")
# Get basemap "stamen_terrain_background" data for map in black and white
# the bbox argument is used to specify the corners of the box to be
# used and zoom determines the detail.
base_cibola <- get_stadiamap(
bbox = c(-110.2, 33.4, -107.8, 35.3),
zoom = 10,
maptype = "stamen_terrain_background",
color = "bw"
)
# Extract edge list from network object
edgelist <- get.edgelist(net)
# Create data frame of beginning and ending points of edges
edges <- data.frame(xy[edgelist[, 1], ], xy[edgelist[, 2], ])
colnames(edges) <- c("X1", "Y1", "X2", "Y2")
# Plot original data on map
ggmap(base_cibola, darken = 0.35) +
geom_segment(
data = edges,
aes(
x = X1,
y = Y1,
xend = X2,
yend = Y2
),
col = "white",
alpha = 0.8,
size = 1
) +
geom_point(
data = xy,
aes(x, y, col = site_info$Region),
alpha = 0.8,
size = 5,
show.legend = FALSE
) +
theme_void()
```
#### Shape-Based and Algorithmic Layouts{#AlgorithmicLayouts}
There are a wide variety of shape-based and algorithmic layouts available for use in R. In most cases, all it takes to change layouts is to simply modify a single line the `ggraph` call to specify our desired layout. The `ggraph` package can use any of the `igraph` layouts as well as many that are built directly into the package. See `?ggraph` for more details and to see the options. Here we show a few examples. Note that we leave the figures calls the same except for the argument `layout = "yourlayout"` in each `ggraph` call and the `ggtitle` name. For the layouts that involve randomization, we use the `set.seed()` function to make sure they will always plot the same. See the discussion of [Figure 6.8](#Figure_6_8) below for more details. Beyond this [Figure 6.9](#Figure_6_9) provides additional options that can be used for hierarchical network data.
```{block, type="rmdtip"}
If you do not specify a graph layout in `ggraph`, the plotting function will automatically choose a layout using the `layout_nicely()` function. Although this sometimes produces useful the layout used is not specified in the call so we recommend supplying a `layout` argument directly.
```
```{r layouts, message=F, warning=F, fig.width=7, fig.height=3}
# circular layout
circ_net <- ggraph(net, layout = "circle") +
geom_edge_link(edge_color = "gray") +
geom_node_point(aes(size = 4, col = site_info$Region),
show.legend = FALSE) +
ggtitle("Circle") +
theme_graph() +
theme(plot.title = element_text(size = rel(1)))
# Fruchcterman-Reingold layout
set.seed(4366)
fr_net <- ggraph(net, layout = "fr") +
geom_edge_link(edge_color = "gray") +
geom_node_point(aes(size = 4, col = site_info$Region),
show.legend = FALSE) +
ggtitle("Fruchterman-Reingold") +
theme_graph() +
theme(plot.title = element_text(size = rel(1)))
# Davidsons and Harels annealing algorithm layout
set.seed(3467)
dh_net <- ggraph(net, layout = "dh") +
geom_edge_link(edge_color = "gray") +
geom_node_point(aes(size = 4, col = site_info$Region),
show.legend = FALSE) +
ggtitle("Davidson-Harel") +
theme_graph() +
theme(plot.title = element_text(size = rel(1)))
library(ggpubr)
ggarrange(circ_net, fr_net, dh_net, nrow = 1)
```
```{block, type="rmdnote"}
In the code above we used the `ggarrange` function within the `ggpubr` package to combine the figures into a single output. This function works with any `ggplot2` or `ggraph` format output when you supply the names of each figure in the order you want them to appear and the number of rows `nrow` and number of columns `ncol` you want the resulting combined figure to have. If you want to label each figure using the `ggarrange` function you can use the `labels` argument.
```
### Node and Edge Options{#NodeEdgeOptions}
There are many options for altering color and symbol for nodes and edges within R. In this section we very briefly discuss some of the most common options. For more details see the discussion of [figures 6.10 through 6.16](#Figure_6_10) below.
#### Nodes {#NodeOptions}
In `ggraph` changing node options mostly consists of changing options within the `geom_node_point` call within the `ggraph` figure call. As we have already seen it is possible to set color for all nodes or by some variable, to change the size of points, and we can also scale points by some metric like centrality. Indeed, it is even possible to make the call to the centrality function in question directly within the figure code.
When selecting point shapes you can use any of the shapes available in base R using `pch` point codes. Here are all of the available options:
```{r pch_points, warning=F, message=F}
library(ggpubr)
ggpubr::show_point_shapes()
```
There are many options for selecting colors for nodes and edges. These can be assigned using standard color names or can be assigned using rgb or hex codes. It is also possible to use standard palettes in packages like `RColorBrewer` or `scales` to specify categorical or continuous color schemes. This is often done using either the `scale_fill_brewer` or `scale_color_brewer` calls from `RColorBrewer`. Here are a couple of examples. In these examples, colors are grouped by site region, node size is scaled to degree centrality, and node and edge color and shape are specified in each call. Note the `alpha` command which controls the transparency of the relevant part of the plot. The scale_size call specifies the maximum and minimum size of points in the plot.
The [R Graph Gallery](https://www.r-graph-gallery.com/38-rcolorbrewers-palettes.html) has a good overview of the available color palettes in `RColorBrewer` and when the can be used. The "Set2" palette used here is a good one for people with many kinds of color vision deficiencies.
```{r color_brewer, fig.width=7, fig.height=4, warning=F, message=F}
library(RColorBrewer)
set.seed(347)
g1 <- ggraph(net, layout = "kk") +
geom_edge_link(edge_color = "gray", alpha = 0.7) +
geom_node_point(
aes(fill = site_info$Region),
shape = 21,
size = igraph::degree(net) / 2,
alpha = 0.5
) +
scale_fill_brewer(palette = "Set2") +
theme_graph() +
theme(legend.position = "none")
set.seed(347)
g2 <- ggraph(net, layout = "kk") +
geom_edge_link(edge_color = "blue", alpha = 0.3) +
geom_node_point(
aes(col = site_info$Region),
shape = 15,
size = igraph::degree(net) / 2,
alpha = 1
) +
scale_color_brewer(palette = "Set1") +
theme_graph() +
theme(legend.position = "none")
ggarrange(g1, g2, nrow = 1)
```
There are also a number of more advanced methods for displaying nodes including displaying figures or other data visualizations in the place of nodes or using images for nodes. There are examples of each of these in the book and code outlining how to create such visuals in the discussions of [Figure 6.3](#Figure_6_3) and [Figure 6.13](#Figure_6_13) below.
#### Edges{#EdgeOptions}
Edges can be modified in terms of color, line type, thickness and many other features just like nodes and this is typically done using the `geom_edge_link` call within `ggraph`. Let"s take a look at a couple of additional examples. In this case we"re going to use a weighted network object in the original [Peeples2018.Rdata](data/Peeples2018.Rdata) file to show how we can vary edges in relation to edge attributes like weight.
In the example here we plot both the line thickness and transparency using the edge weights associated with the network object. We also are using the `scale_edge_color_gradient2` to specify a continuous edge color scheme with three anchors. For more details see `?scale_edge_color`
```{r edge_options1, message=F, warning=F}
library(intergraph)
net2 <- asIgraph(brnet_w)
set.seed(436)
ggraph(net2, "stress") +
geom_edge_link(aes(width = weight, alpha = weight, col = weight)) +
scale_edge_color_gradient2(
low = "#440154FF",
mid = "#238A8DFF",
high = "#FDE725FF",
midpoint = 0.8
) +
scale_edge_width(range = c(1, 5)) +
geom_node_point(size = 4, col = "blue") +
labs(edge_color = "Edge Weight Color Scale") +
theme_graph()
```
Another feature of edges that is often important in visualizations is the presence or absence and type of arrows. Arrows can be modified in `ggraph` using the `arrow` argument within a `geom_edge_link` call. The most relevant options are the length of the arrow (which determines size), the `type` argument which specifies an open or closed arrow, and the spacing of the arrow which can be set by the `end_cap` and `start_cap` respectively which define the gap between the arrow point and the node. These values can all be set using absolute measurements as shown in the example below. Since this is an undirected network we use the argument `ends = "first"` to simulated a directed network so that arrowheads will only be drawn the first time an edge appears in the edge list. See `?arrow` for more details on options.
```{r edge_options2, message=F, warning=F}
set.seed(436)
ggraph(net, "stress") +
geom_edge_link(
arrow = arrow(
length = unit(2, "mm"),
ends = "first",
type = "closed"
),
end_cap = circle(0, "mm"),
start_cap = circle(3, "mm"),
edge_colour = "black"
) +
geom_node_point(size = 4, col = "blue") +
theme_graph()
```
Another common consideration with edges is the shape of the edges themselves. So far we have used examples where the edges are all straight lines, but it is also possible to draw them as arcs or so that they fan out from nodes so that multiple connections are visible. In general, all you need to do to change this option is to use another command in the `geom_edge_` family of commands. For example, in the following chunk of code we produce a network with arcs rather than straight lines. In this case the argument `strength` controls the amount of bend in the lines.
```{r edge_arc}
set.seed(436)
ggraph(net, "kk") +
geom_edge_arc(edge_colour = "black", strength = 0.1) +
geom_node_point(size = 4, col = "blue") +
theme_graph()
```
It is also possible to not show edges at all but instead just a gradient scale representing the density of edges using the `geom_edge_density` call. This could be useful in very large and complex networks.
```{r edge_density, warning=F}
set.seed(436)
ggraph(net2, "kk") +
geom_edge_density() +
geom_node_point(size = 4, col = "blue") +
theme_graph()
```
```{block, type="rmdtip"}
If you want to see all of the possible options for `geom_edge_` commands, simply use the help command on any one of the functions (i.e., `?geom_edge_arc`) and scroll down in the help window to the section labeled "See Also."
```
### Labels {#LabelOptions}
In many cases you may want to label either the nodes, edges, or other features of a network. This is relatively easy to do in `ggraph` with the `geom_node_text()` command. This will place labels as specified on each node. If you use the `repel = TRUE` argument it will repel the names slightly from the node to make them more readable. As shown in the example for [Figure 6.4](#Figure_6_4) it is also possible to filter labels to label only certain nodes.
```{r node_label}
set.seed(436)
ggraph(net2, "fr") +
geom_edge_link() +
geom_node_point(size = 4, col = "blue") +
geom_node_text(aes(label = vertex.names), size = 3, repel = TRUE) +
theme_graph()
```
It is also possible to label edges by adding an argument directly into the `geom_edge_` command. In practice, this really only works with very small networks. In the next chunk of code, we create a small network and demonstrate this function.
```{r edge_label}
g <- graph(c("A", "B",
"B", "C",
"A", "C",
"A", "A",
"C", "B",
"D", "C"))
E(g)$weight <- c(3, 1, 6, 8, 4, 2)
set.seed(4351)
ggraph(g, layout = "stress") +
geom_edge_fan(aes(label = weight),
angle_calc = "along",
label_dodge = unit(2, "mm")) +
geom_node_point(size = 20, col = "lightblue") +
geom_node_text(label = V(g)$name) +
theme_graph()
```
### Be Kind to the Color Blind{#Colorblind}
When selecting your color schemes, it is important to consider the impact of a particular color scheme on color blind readers. There is an excellent set of R scripts on GitHub in a package called [colorblindr](https://github.com/clauswilke/colorblindr) by Claus Wilke which can help you do just that. I have slightly modified the code from the `colorblindr` package and created a script called [colorblindr.R](data/colorblindr.R) which you can download and use to test out your network. Simply run the code in the script and then use the `cvd_grid2()` function on a `ggplot` or `ggraph` object to see simulated colors.
The chunk of code below loads the `colorblindr.R` script and then plots a figure using `RColorBrewer` color `Set2` in its original unmodified format and then as it might look to readers with some of the most common forms of color vision issues. Download the [colorblindr.R script](scripts/colorblindr.R) to follow along.
```{r colorblind, warning=F, message=F, fig.height=7, fig.width=7}
library(colorspace)
source("scripts/colorblindr.R")
cvd_grid2(g1)
```
### Communities and Groups{#VizCommunities}
Showing communities or other groups in network visualizations can be as simple as color coding nodes or edges as we have seen in many examples here. It is sometimes also useful to highlight groups by creating a convex hull or circle around the relevant points. This can be done in `ggraph` using the `geom_mark_hull` command within the `ggforce` package. You will also need a package called `concaveman` that allows you to set the concavity of the hulls around points.
The following chunk of code provides a simple example using the Louvain clustering algorithm.
```{r ggforce, warning=F, message=F, fig.width=7, fig.height=7}
library(ggforce)
library(concaveman)
# Define clusters
grp <- as.factor(cluster_louvain(net2)$membership)
set.seed(4343)
ggraph(net2, layout = "fr") +
geom_edge_link0(width = 0.2) +
geom_node_point(aes(fill = grp),
shape = 21,
size = 5,
alpha = 0.75) +
# Create hull around points within group and label
geom_mark_hull(
aes(
x,
y,
group = grp,
fill = grp,
),
concavity = 4,
expand = ggplot2::unit(2, "mm"),
alpha = 0.25,
) +
scale_fill_brewer(palette = "Set2") +
theme_graph()
```
The discussion of [Figure 6.4](#Figure_6_4) below provides another similar example. There are many more complicated ways of showing network groups provided by the examples covering figures from the book. For example, [Figure 6.18](#Figure_6_18) provides an example of the "group-in-a-box" technique using the NodeXL software package. [Figure 6.19](#Figure_6_19) illustrates the use of matrices as visualization tools and [Figure 6.20](#Figure_6_20) provides links to the Nodetrix hybrid visualization software.
## Replicating the Book Figures{#ReplicatingBookFigures}
In this section we go through each figure in Chapter 6 of Brughmans and Peeples (2023) and detail how the final graph was created for all figures that were created using R. For those figures not created in R we describe what software and data were used and provide additional resources where available. We hope these examples will serve as inspiration for your own network visualization experiments. Some of these figures are relatively simple while others are quite complex. They are presented in the order they appear in the book.
### Figure 6.1: Manual Layout {- #Figure_6_1}
Figure 6.1. An example of an early hand drawn network graph (sociogram) published by Moreno (1932: 101). Moreno noted that the nodes at the top and bottom of the sociogram have the most connections and therefore represent the nodes of greatest importance. These specific “important” points are emphasized through both their size and their placement.
Note that the hand drawn version of this figure is presented in the book and this digital example is presented only for illustrative purposes. This shows how you can employ user defined layouts by directly supplying coordinates for the nodes in the plot. [Download the Moreno data to follow along]("data/Moreno.csv").
```{r Fig6_1, message=F, warning=F, fig.width=2, fig.height=3}
library(igraph)
library(ggraph)
# Read in adjacency matrix of Moreno data and covert to network
moreno <-
as.matrix(read.csv("data/Moreno.csv", header = TRUE, row.names = 1))
g_moreno <- graph_from_adjacency_matrix(moreno)
# Create xy coordinates associated with each node
xy <- matrix(
c(4, 7, 1, 5, 6, 5, 2, 4, 3, 4, 5, 4, 1, 2.5, 6, 2.5, 4, 1),
nrow = 9,
ncol = 2,
byrow = TRUE
)
# Plot the network using layout = "manual" to place nodes using xy coordinates
ggraph(g_moreno,
layout = "manual",
x = xy[, 1],
y = xy[, 2]) +
geom_edge_link() +
geom_node_point(fill = "white",
shape = 21,
size = igraph::degree(g_moreno)) +
scale_size(range = c(2, 3)) +
theme_graph()
```
### Figure 6.2: Examples of Common Network Plot Formats {- #Figure_6_2}
Figure. 6.2. These plots are all different visual representations of the same network data from Peeples’s (2018) data where edges are defined based on the technological similarities of cooking pots from each node which represent archaeological settlements.
The code below creates each of the individual figures and then compiles them into a single composite figure for plotting.
First read in the data ([all data are combined in a single RData file here](data/Peeples2018.Rdata)).
```{r Fig6_2_dat, message=F, warning=F, fig.height=7, fig.width=7}
library(igraph)
library(statnet)
library(intergraph)
library(ggplotify)
library(ggraph)
library(ggpubr)
load(file = "data/Peeples2018.Rdata")
## contains objects
# site_info - site locations and attributes
# ceramic_br - raw Brainerd-Robinson similarity among sites
# brnet - binary network with similarity values > 0.65
# defined as edges in statnet/network format
# brnet_w - weighted network with edges (>0.65) given weight
# values based on BR similarity in statnet/network format
##
```
Fig 6.2a - A simple network graph with nodes placed based on the Fruchterman-Reingold algorithm
```{r Fig6_2a, message=F, warning=F, fig.width=7, fig.height=7}
## create simple graph with Fruchterman - Reingold layout
set.seed(423)
f6_2a <- ggraph(brnet, "fr") +
geom_edge_link(edge_colour = "grey66") +
geom_node_point(aes(size = 5), col = "red", show.legend = FALSE) +
theme_graph()
f6_2a
```
Fig 6.2b - Network graph nodes with placed based on the real geographic locations of settlements and are color coded based on sub-regions.
```{r Fig6_2b, message=F, warning=F, fig.width=7, fig.height=7}
## create graph with layout determined by site location and
## nodes color coded by region
f6_2b <- ggraph(brnet, "manual",
x = site_info$x,
y = site_info$y) +
geom_edge_link(edge_colour = "grey66") +
geom_node_point(aes(size = 2, col = site_info$Region),
show.legend = FALSE) +
theme_graph()
f6_2b
```
Fig 6.2c - A graph designed to show how many different kinds of information can be combined in a single network plot. In this network graph node placement is defined by the stress majorization algorithm (see below), with nodes color coded based on region, with different symbols for different kinds of public architectural features found at those sites, and with nodes scaled based on betweenness centrality scores. The line weight of each edge is used to indicate relative tie-strength.
```{r Fig6_2c, message=F, warning=F, fig.width=7, fig.height=7}
# create vectors of attributes and betweenness centrality and plot
# network with nodes color coded by region, sized by betweenness,
# with symbols representing public architectural features, and
# with edges weighted by BR similarity
col1 <- as.factor((site_info$Great.Kiva))
col2 <- as.factor((site_info$Region))
bw <- sna::betweenness(brnet_w)
f6_2c <- ggraph(brnet_w, "stress") +
geom_edge_link(aes(width = weight, alpha = weight),
edge_colour = "black",
show.legend = FALSE) +
scale_edge_width(range = c(1, 2)) +
geom_node_point(aes(
size = bw,
shape = col1,
fill = col1,
col = site_info$Region
),
show.legend = FALSE) +
scale_fill_discrete() +
scale_size(range = c(4, 12)) +
theme_graph()
f6_2c
```
Fig. 6.2d - This network graph is laid out using the Kamada-Kawai force directed algorithm with nodes color coded based on communities detected using the Louvain community detection algorithm. Each community is also indicated by a circle highlighting the relevant nodes. Edges within communities are shown in black and edges between communities are shown in red.
In this plot we use the `as.ggplot` function to convert a traditional `igraph` plot to a `ggraph` plot to illustrate how this can be done.
```{r Fig6_2d, message=F, warning=F, fig.width=7, fig.height=7}
# convert network object to igraph object and calculate Louvain
# cluster membership plot and convert to grob to combine in ggplot
g <- asIgraph(brnet_w)
clst <- cluster_louvain(g)
f6_2d <- as.ggplot(
~ plot(
clst,
g,
layout = layout_with_kk,
vertex.label = NA,
vertex.size = 10,
col = rainbow(4)[clst$membership]
)
)
f6_2d
```
Finally, we use the `ggarrange` function from the `ggpubr` package to combine all of these plots into a single composite plot.
```{r Fig6_2_all, message=F, warning=F, fig.height=7, fig.width=7}
# Combine all plots into a single figure using ggarrange
figure_6_2 <- ggarrange(
f6_2a,
f6_2b,
f6_2c,
f6_2d,
nrow = 2,
ncol = 2,
labels = c("(a)", "(b)", "(c)", "(d)"),
font.label = list(size = 22)
)
figure_6_2
```
### Figure 6.3: Examples of Rare Network Plot Formats {- #Figure_6_3}
Figure 6.3. Examples of less common network visuals techniques for Peeples’s (2018) ceramic technological similarity data.
Fig 6.3a - A weighted heat plot of the underlying similarity matrix with hierarchical clusters shown on each axis. This plot relies on a packages called `superheat` that produces plots formatted as we see here. The required input is a symmetric similarity matrix object.
```{block, type="rmdnote"}
In the chunk of code below we use the `as.ggplot` function from the `ggplotify` package. This function converts a non `ggplot2` style function into a `ggplot2` format so that it can be further used with packages like `ggpubr` and `colorblindr`.
```
```{r Fig6_3a, message=F, warning=F, fig.width=7, fig.height=7, cache=T}
library(igraph)
library(statnet)
library(intergraph)
library(ggraph)
library(ggplotify)
library(superheat)
ceramic_br_a <- ceramic_br
diag(ceramic_br_a) <- NA
f6_3a <- as.ggplot(
~ superheat(
ceramic_br_a,
row.dendrogram = TRUE,
col.dendrogram = TRUE,
grid.hline.col = "white",
grid.vline.col = "white",
legend = FALSE,
left.label.size = 0,
bottom.label.size = 0
)
)
f6_3a
```
Fig. 6.3b - An arcplot with within group ties shown above the plot and between group ties shown below.
For this plot, we read in a adjacency matrix that is ordered in the order we want it to show up in the final plot. [Download the file here](data/Peeples_arcplot.csv) to follow along. Note that the object `grp` must be produced in the same order that the nodes appear in the original adjacency matrix file.
```{r Fig6_3b, message=F, warning=F, fig.width=7, fig.height=7}
arc_dat <- read.csv("data/Peeples_arcplot.csv",
header = TRUE,
row.names = 1)
g <- graph_from_adjacency_matrix(as.matrix(t(arc_dat)))
# set groups for color
grp <- as.factor(c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1))
# Make the graph
f6_3b <- ggraph(g, layout = "linear") +
geom_edge_arc(
edge_colour = "black",
edge_alpha = 0.2,
edge_width = 0.7,
fold = FALSE,
strength = 1,
show.legend = FALSE
) +
geom_node_point(
aes(
size = igraph::degree(g),
color = grp,
fill = grp
),
alpha = 0.5,
show.legend = FALSE
) +
scale_size_continuous(range = c(4, 8)) +
theme_graph()
f6_3b
```
Fig. 6.3c - Network plot with sites in geographic locations and edges bundled using the edge bundling hammer routine.
```{block, type="rmdwarning"}
This function requires the `edgebundle` package be installed along with `reticulate` and Python 3.8 (see [Packages](#Packages)) and uses the [Cibola technological similarity data](data/Peeples2018.Rdata). Check [Data and Workspace Setup](#ShouldIInstall) section for more details on getting the edge bundling package and Python up and running.
Be aware that this function may take a long time on your computer depending on your processing power and RAM.
```
```{r, echo=F, message=F, warning=F}
library(reticulate)
#use_condaenv("r-reticulate")
```
```{r Fig6_3c, message=F, warning=F, fig.width=7, fig.height=7, cache=T}
library(edgebundle)
load("data/Peeples2018.Rdata")
# Create attribute file with required data
xy <- as.data.frame(site_info[, 1:2])
xy <- cbind(xy, site_info$Region)
colnames(xy) <- c("x", "y", "Region")
# Run hammer bundling routine
g <- asIgraph(brnet)
hbundle <- edge_bundle_hammer(g, xy, bw = 5, decay = 0.3)
f6_3c <- ggplot() +
geom_path(data = hbundle, aes(x, y, group = group),
col = "gray66", size = 0.5) +
geom_point(data = xy, aes(x, y, col = Region),
size = 5, alpha = 0.75, show.legend = FALSE) +
theme_void()
f6_3c
```
Fig. 6.3d - Network graph where nodes are replaced by waffle plots that show relative frequencies of the most common ceramic technological clusters.
This is a somewhat complicated plot that requires a couple of specialized libraries and additional steps along the way. We provide comments in the code below to help you follow along. Essentially the routine creates a series of waffle plots and then uses them as annotations to replace the nodes in the final `ggraph`. This plot requires that you install a development package called `ggwaffle`. Run the line of code below before creating the figure if you need to add this package.
```{r, eval=F}
devtools::install_github("liamgilbey/ggwaffle")
```
```{block, type="rmdtip"}
There are numerous projects that are in the R CRAN archive and those packages have been peer reviewed and evaluated. There are many other packages and compendiums designed for use in R that are not yet in the CRAN archive. Frequently these are found as packages in development on GitHub. In order to use these packages in development, you can use the `install_github` function wrapped inside the `devtools` package (though it originates in the `remotes` package). In order to install a package from GitHub, you type supply "username/packagename" inside the `install_github` call.
```
Let's now look at the figure code:
```{r Fig6_3d, message=F, warning=F, fig.width=10, fig.height=10}
# Initialize libraries
library(ggwaffle)
library(tidyverse)
# Create igraph object from data imported above
cibola_adj <-
read.csv(file = "data/Cibola_adj.csv",
header = TRUE,
row.names = 1)
g <- graph_from_adjacency_matrix(as.matrix(cibola_adj),
mode = "undirected")
# Import raw ceramic data and convert to proportions
ceramic_clust <- read.csv(file = "data/Cibola_clust.csv",
header = TRUE,
row.names = 1)
ceramic_p <- prop.table(as.matrix(ceramic_clust), margin = 1)
# Assign vertex attributes to the network object g which represent
# columns in the ceramic.p table
V(g)$c1 <- ceramic_p[, 1]
V(g)$c2 <- ceramic_p[, 2]
V(g)$c3 <- ceramic_p[, 3]
V(g)$c4 <- ceramic_p[, 4]
V(g)$c5 <- ceramic_p[, 5]
V(g)$c6 <- ceramic_p[, 6]
V(g)$c7 <- ceramic_p[, 7]
V(g)$c8 <- ceramic_p[, 8]
V(g)$c9 <- ceramic_p[, 9]
V(g)$c10 <- ceramic_p[, 10]
# Precompute the layout and assign coordinates as x and y in network g
set.seed(345434534)
xy <- layout_with_fr(g)
V(g)$x <- xy[, 1]
V(g)$y <- xy[, 2]
# Create a data frame that contains the 4 most common
# categories in the ceramic table, the node id, and the proportion
# of that ceramic category at that node
nodes_wide <- igraph::as_data_frame(g, "vertices")
nodes_long <- nodes_wide %>%
dplyr::select(c1:c4) %>%
mutate(id = seq_len(nrow(nodes_wide))) %>%
gather("attr", "value", c1:c4)
nodes_out <- NULL
for (j in seq_len(nrow(nodes_long))) {
temp <- do.call("rbind", replicate(round(nodes_long[j, ]$value * 50, 0),
nodes_long[j, ], simplify = FALSE))
nodes_out <- rbind(nodes_out, temp)
}
# Create a list object for the call to each bar chart by node
bar_list <- lapply(1:vcount(g), function(i) {
gt_plot <- ggplotGrob(
ggplot(waffle_iron(nodes_out[nodes_out$id == i, ],
aes_d(group = attr))) +
geom_waffle(aes(x, y, fill = group), size = 10) +
coord_equal() +
labs(x = NULL, y = NULL) +
theme(
legend.position = "none",
panel.background = element_rect(fill = "white", colour = NA),
line = element_blank(),
text = element_blank()
)
)
panel_coords <- gt_plot$layout[gt_plot$layout$name == "panel", ]
gt_plot[panel_coords$t:panel_coords$b, panel_coords$l:panel_coords$r]
})
# Convert the results above into custom annotation
annot_list <- lapply(1:vcount(g), function(i) {
xmin <- nodes_wide$x[i] - .25
xmax <- nodes_wide$x[i] + .25
ymin <- nodes_wide$y[i] - .25
ymax <- nodes_wide$y[i] + .25
annotation_custom(
bar_list[[i]],
xmin = xmin,
xmax = xmax,
ymin = ymin,
ymax = ymax
)
})
# create basic network
p <- ggraph(g, "manual", x = V(g)$x, y = V(g)$y) +
geom_edge_link0() +
theme_graph() +
coord_fixed()
# put everything together by combining with the annotation (bar plots + network)
f6_3d <- Reduce("+", annot_list, p)
f6_3d
```
```{block, type="rmdtip"}
The inspiration for the example above came from a [R blogpost by schochastics (David Schoch)](https://www.r-bloggers.com/2020/03/ggraph-tricks-for-common-problems/). As that post shows, any figures that can be treated as `ggplot2` objects can be used in the place of nodes by defining them as "annotations." See the post for more details.
```
Now let's look at all of the figures together.
![](images/Figure_6_3.jpg){width=100%}
### Figure 6.4: Simple Network with Clusters {- #Figure_6_4}
Figure 6.4. A network among Clovis era sites in the Western U.S. with connections based on shared lithic raw material sources. Nodes are scaled based on betweenness centrality with the top seven sites labelled. Color-coded clusters were defined using the Louvain algorithm.
```{block, type="rmdtip"}
This example shows how to define and indicate groups and label points based on their values. Note the use of the `ifelse` call in the `geom_node_text` portion of the plot. See [here](#Conditionals) for more information on how `ifelse` statements work.
```
```{r Fig6_4, warning=F, message=F, fig.width=7, fig.height=7}
library(ggforce)
library(ggraph)
library(statnet)
library(igraph)
clovis <- read.csv("data/Clovis.csv", header = TRUE, row.names = 1)
colnames(clovis) <- row.names(clovis)
graph <- graph_from_adjacency_matrix(as.matrix(clovis),
mode = "undirected",
diag = FALSE)
bw <- igraph::betweenness(graph)
grp <- as.factor(cluster_louvain(graph)$membership)
set.seed(43643548)
ggraph(graph, layout = "fr") +
geom_edge_link(edge_width = 1, color = "gray") +
geom_node_point(aes(fill = grp, size = bw, color = grp),
shape = 21,
alpha = 0.75) +
scale_size(range = c(2, 20)) +
geom_mark_hull(
aes(
x,
y,
group = grp,
fill = grp,
color = NA
),
concavity = 4,
expand = unit(2, "mm"),
alpha = 0.25,
label.fontsize = 12
) +
scale_color_brewer(palette = "Set2") +
scale_fill_brewer(palette = "Set2") +
scale_edge_color_manual(values = c(rgb(0, 0, 0, 0.3),
rgb(0, 0, 0, 1))) +
# If else statement only labels points that meet the condition
geom_node_text(aes(label = ifelse(bw > 40,
as.character(name),
NA_character_)),
size = 4) +
theme_graph() +
theme(legend.position = "none")
```
### Figure 6.5: Interactive Layout {- #Figure_6_5}
Figure 6.5. An example of the same network graph with two simple user defined layouts created interactively.