Consider vector based stack for benchmarking vignette #23

TimTaylor · 2023-01-04T16:50:09Z

For completeness would it be worth including a vector based stack for comparison in the benchmarking vignette? The performance of one based on Martin Morgan's suggestion on stackoverflow is better than a list/environment based one.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(forcats)
library(ggplot2)
library(collections)
#> 
#> Attaching package: 'collections'
#> The following object is masked from 'package:utils':
#> 
#>     stack

# vector based stack (based on https://stackoverflow.com/a/18678440)
vec_stack <- function(type="double", length=1000L) {
    v <- vector(type, length)
    i <- 1L
    len <- length(v)
    list(
        push = function(elt) {
            if (typeof(elt) != type)
                stop("types must match")
            if (i == len) {
                length(v) <<- 1.6 * len
                len <<- length(v)
            }
            v[[i]] <<- elt
            i <<- i + 1L
        },
        pop = function() {
            i <<- i - 1L
            v[[i]]
        },
        clear = function() {
            v <<- vector(type, length)
        }
    )
}

# list based stack from benchmark vignette
list_stack <- function() {
    self <- environment()
    q <- NULL
    n <- NULL
    push <- function(item) {
        if (is.null(item)) {
            q[n + 1] <<- list(item)
        } else {
            q[[n + 1]] <<- item
        }
        n <<- n + 1
        invisible(self)
    }
    pop <- function() {
        if (n == 0) stop("stack is empty")
        v <- q[[n]]
        q <<- q[-n]
        n <<- n - 1
        v
    }
    clear <- function() {
        q <<- list()
        n <<- 0
        invisible(self)
    }
    clear()
    self
}

# bench mark based on one in vignette (slightly extended n)
bench_stack <- bench::press(
    n = c(10, 50, 100, 200, 500, 1000),
    bench::mark(
        `base::list_stack_grow` = {
            q <- list_stack()
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `base::vec_stack_pre_allocate` = {
            q <- vec_stack(length = n)
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `base::vec_stack_grow` = {
            q <- vec_stack(length = 2L)
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `collections::stack` = {
            q <- stack()
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        check = FALSE
    )
) |> 
    mutate(expression = fct_reorder(
        as.character(expression), median, .fun = mean, .desc = TRUE))
#> Running with:
#>       n
#> 1    10
#> 2    50
#> 3   100
#> 4   200
#> 5   500
#> 6  1000

# plot
bench_stack %>%
    ggplot(aes(x = n, y = median)) +
    geom_line(aes(color = expression)) +
    scale_colour_brewer(palette = "Set2", direction = -1) +
    ggtitle("push and pop n times") + ylab("time")

^{Created on 2023-01-04 with reprex v2.0.2}

randy3k · 2023-01-04T18:20:22Z

The vector based stack works so much better now by not making copies with each append (something in 3.4.x?)
If you really want to use it in practice, just make sure you trim the vector length when it is shortened enough, say the length is less than 0.5 * the preallocated length.

Feel free to submit the change for the vignette, I will render the site once the update is merged.

TimTaylor · 2023-01-06T09:11:52Z

Cool. Will try and PR something over the next couple of weeks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider vector based stack for benchmarking vignette #23

Consider vector based stack for benchmarking vignette #23

TimTaylor commented Jan 4, 2023 •

edited

Loading

randy3k commented Jan 4, 2023

TimTaylor commented Jan 6, 2023

Consider vector based stack for benchmarking vignette #23

Consider vector based stack for benchmarking vignette #23

Comments

TimTaylor commented Jan 4, 2023 • edited Loading

randy3k commented Jan 4, 2023

TimTaylor commented Jan 6, 2023

TimTaylor commented Jan 4, 2023 •

edited

Loading