Skip to content

A non R user's guide to ChatStat

Greg Sutcliffe edited this page Dec 17, 2021 · 1 revision

We recognise that R is a less-common language for many users of Matrix. In this doc, we'll walk through how to get it set up and running.

IDE?

I personally the RStudio IDE, which is free for non-commercial use. However, you can just install R itself, and use it from the commandline - but you won't get such an easy interface to experiment with plots.

Install R

Either way, you'll need R - you can get it from your package manager, if you have one, or from https://cran.r-project.org/. You will need R 4.1 or higher.

Once installed, check you can fire up your R shell (either within RStudio, or at the cmdline)

Install the packages

For this tutuorial, we'll need a few packages. R uses CRAN as a package index, but ChatStat isn't on CRAN, so we have to do an extra thing for that.

install.packages("tidyverse")
install.packages("remotes")
remotes::install_github("GregSutcliffe/ChatStat")

Hopefully that all goes well! If not, check for any dependency errors, and let us know so we can update this wiki

Get some data

For this example I'm going to use the This Week In Matrix channel and get 7 days of data. You'll also need a Matrix access token for your account (you can get it from the settings page in Element)

library(tidyverse)
library(ChatStat)
Sys.setenv('token' = 'syt_my_token',
           'host' = 'my_matrix_homeserver.org')

Sys.setenv(LOG_LEVEL='DEBUG')
df <- get_rooms('!QQpfJfZvqxbCfeDgCj:matrix.org','2021-12-10 00:00:00')
df

If all goes well you should see something like this:

# A tibble: 1 × 2
# Groups:   room [1]
  room                           events            
  <chr>                          <list>            
1 !QQpfJfZvqxbCfeDgCj:matrix.org <tibble [819 × 7]>

That means we have 819 events from the room. If you pass a list of more than one room, you'll get one room per room, but we'll leave that for another tutorial.

Now we'll unnest it and make a graph of messages per day.

r |>
  # expand the rows, gives us 819 rows
  unnest(events) |>
  # we have all events, so filter for messages and reactions
  filter( !is.na(body) | type == 'm.reaction') |>
  # truncate the sending time to just the date
  mutate(day = as_datetime(cut(time,'day'))) |>
  # count the number of events in each date
  count(day) |>
  ggplot(aes(day,n)) +
  geom_col() + 
  labs(title='Messages per day',
       subtitle='Based on messages and reactions',
       x = 'Date', y = 'Count') +
  guides(fill='none') +
  theme_minimal()

With a bit of luck, you'll get a chart like this:

Messages per day in TWIM

From here, you can go nuts with either the tidyverse code to slice the data different way, or the ggplot2 to display it in different ways. Stay tuned for more examples!

Clone this wiki locally