Skip to content

Commit

Permalink
use .datatable.aware=TRUE
Browse files Browse the repository at this point in the history
  • Loading branch information
yjunechoe committed Jun 23, 2024
1 parent 820ca0a commit c29173c
Show file tree
Hide file tree
Showing 8 changed files with 3,665 additions and 3,921 deletions.
21 changes: 14 additions & 7 deletions _posts/2024-06-09-ave-for-the-average/ave-for-the-average.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -178,11 +178,16 @@ It's the perfect mashup of base R + tidyverse. Base R takes care of the problem

tidyverse 🤝 base R

## Aside: `ave()` for the average {data.table} user
## Aside: data.table 🤝 collapse

Since I wrote this blog post, I discovered that `{data.table}` recently added in support for using `names(.SD)` in the LHS of the walrus `:=`. I'm so excited for this to hit the next release (v1.6.0)!

I'm also trying to be more mindful of showcasing `{data.table}` where I talk about `{dplyr}`, so here's a solution to compare with the `dplyr::across()` solution above.
I've trying to be more mindful of showcasing `{data.table}` whenever I talk about `{dplyr}`, so here's a solution to compare with the `dplyr::across()` solution above.

```{r, echo=FALSE}
.datatable.aware = TRUE
```


```{r, message=FALSE}
# data.table::update_dev_pkg()
Expand All @@ -192,26 +197,28 @@ input_dt
```

```{r}
input_dt[ , paste0("freq_", names(.SD)) := lapply(.SD, \(x) ave(freq, x, FUN = sum)), .SDcols = a:c]
input_dt[, paste0("freq_", names(.SD)) := lapply(.SD, \(x) ave(freq, x, FUN = sum)), .SDcols = a:c]
input_dt
```

In practice, I often pair `{data.table}` with `{collapse}`, where the latter provides a rich and performant set of split-apply-combine vector operations, to the likes ot `ave()`. In `{collapse}`, `ave(..., FUN = sum)` can be expressed with `fsum(..., TRA = "replace")`:
In practice, I often pair `{data.table}` with `{collapse}`, where the latter provides a rich and performant set of split-apply-combine vector operations, to the likes of `ave()`. In `{collapse}`, `ave(..., FUN = sum)` can be expressed as `fsum(..., TRA = "replace")`:

```{r}
library(collapse)
ave(input_dt$freq, input_dt$a, FUN = sum)
fsum(input_dt$freq, input_dt$a, TRA = "replace")
fsum(input_dt$freq, input_dt$a, TRA = "replace") # Also, TRA = 2
```

So the `{data.table}` 🤝 `{collapse}` version of this would be:^[I couldn't show this here with this particular example, but another nice feature of `{collapse}` 🤝 `{data.table}` is the fact that they do not shy away from consuming/producing matrices: see `scale()[,1]` vs. `fscale()` for a good example of this.]
So the version of the solution integrating `fsum()` would be:^[I couldn't show this here with this particular example, but another nice feature of `{collapse}` 🤝 `{data.table}` is the fact that they do not shy away from consuming/producing matrices: see `scale()[,1]` vs. `fscale()` for a good example of this.]

```{r}
input_dt[, names(.SD) := NULL, .SDcols = patterns("^freq_")]
input_dt[ , paste0("freq_", names(.SD)) := lapply(.SD, \(x) fsum(freq, x, TRA = "replace")), .SDcols = a:c]
input_dt[, paste0("freq_", names(.SD)) := lapply(.SD, \(x) fsum(freq, x, TRA = 2)), .SDcols = a:c]
input_dt
```

data.table 🤝 collapse

## sessionInfo()

```{r}
Expand Down
60 changes: 30 additions & 30 deletions _posts/2024-06-09-ave-for-the-average/ave-for-the-average.html

Large diffs are not rendered by default.

Loading

0 comments on commit c29173c

Please sign in to comment.