-
Notifications
You must be signed in to change notification settings - Fork 27
/
Copy path22-Visualising-Intersecting-Follower-Sets-with-UpsetR.Rmd
78 lines (56 loc) · 3.17 KB
/
22-Visualising-Intersecting-Follower-Sets-with-UpsetR.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Visualizing Intersecting Follower Sets with UpSetR
## Problem
You want to examine the intersection of twitter followers between a group of definied twitter handles.
## Solution
- Scrape all follower ID's for each handle
- Combine into one dataframe
- Create de-duplicated list of all followers
- Build a logical matrix to indicate if each follower follows each handle or not
- Plot the intersecting sets with [`UpSetR`](https://github.com/hms-dbmi/UpSetR)
## Discussion
Set visualization, typically done using Venn diagrams, can become challenging when the number of sets exceeds a a trivial threshold. To address this, the UpSet project was born.
> A novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections.
Thankfully, there is an R package version of the project that we can use with follower data pulled with `rtweet`. `UpSetR` requires the data to be in a binary matrix format, so there is some data wrangling work to be done before we can visualize.
```{r 22_lib, message=FALSE, warning=FALSE}
library(rtweet)
library(tidyverse)
library(UpSetR)
```
First we will make a list of twitter handles we want to compare then scrape all of their followers into a one dataframe using a `get_followers` function inside a `purrr::map_df` call. Set `n` to a number => the max follower count in your set and `retryonratelimit = TRUE` to ensure you capture all followers. This may take some time depending on how may followers you are scraping.
```{r 22_followers, message=FALSE, warning=FALSE, cache=TRUE}
# get a list of twitter handles you want to compare
rstaters <- c("dataandme",
"JennyBryan",
"hrbrmstr",
"xieyihui",
"drob",
"juliasilge",
"thomasp85")
# scrape the user_id of all followers for each handle in the list and bind into 1 dataframe
followers <- rstaters %>%
map_df(~ get_followers(.x, n = 20000, retryonratelimit = TRUE) %>%
mutate(account = .x))
head(followers)
tail(followers)
```
Next we form a binary matrix by using an `ifelse` inside another `map_df` to ascertain whether or not each follower in the master list follows each of the twitter handles.
```{r 22_matrix, message=FALSE, warning=FALSE, cache=TRUE}
# get a de-duplicated list of all followers
aRdent_followers <- unique(followers$user_id)
# for each follower, get a binary indicator of whether they follow each tweeter or not and bind to one dataframe
binaries <- rstaters %>%
map_dfc(~ ifelse(aRdent_followers %in% filter(followers, account == .x)$user_id, 1, 0) %>%
as.data.frame) # UpSetR doesn't like tibbles
# set column names
names(binaries) <- rstaters
# have a look at the data
glimpse(binaries)
```
Finally, we let `UpSetR` work its magic on the matrix and visualize the intersections...
```{r 22_upset, message=FALSE, warning=FALSE, cache=TRUE, fig.width=10, fig.height=6}
# plot the sets with UpSetR
upset(binaries, nsets = 7, main.bar.color = "SteelBlue", sets.bar.color = "DarkCyan",
sets.x.label = "Follower Count", text.scale = c(rep(1.4, 5), 1), order.by = "freq")
```
## See Also
- [UpSet Project](http://caleydo.org/tools/upset/)