Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider vector based stack for benchmarking vignette #23

Open
TimTaylor opened this issue Jan 4, 2023 · 2 comments
Open

Consider vector based stack for benchmarking vignette #23

TimTaylor opened this issue Jan 4, 2023 · 2 comments

Comments

@TimTaylor
Copy link

TimTaylor commented Jan 4, 2023

For completeness would it be worth including a vector based stack for comparison in the benchmarking vignette? The performance of one based on Martin Morgan's suggestion on stackoverflow is better than a list/environment based one.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(forcats)
library(ggplot2)
library(collections)
#> 
#> Attaching package: 'collections'
#> The following object is masked from 'package:utils':
#> 
#>     stack

# vector based stack (based on https://stackoverflow.com/a/18678440)
vec_stack <- function(type="double", length=1000L) {
    v <- vector(type, length)
    i <- 1L
    len <- length(v)
    list(
        push = function(elt) {
            if (typeof(elt) != type)
                stop("types must match")
            if (i == len) {
                length(v) <<- 1.6 * len
                len <<- length(v)
            }
            v[[i]] <<- elt
            i <<- i + 1L
        },
        pop = function() {
            i <<- i - 1L
            v[[i]]
        },
        clear = function() {
            v <<- vector(type, length)
        }
    )
}

# list based stack from benchmark vignette
list_stack <- function() {
    self <- environment()
    q <- NULL
    n <- NULL
    push <- function(item) {
        if (is.null(item)) {
            q[n + 1] <<- list(item)
        } else {
            q[[n + 1]] <<- item
        }
        n <<- n + 1
        invisible(self)
    }
    pop <- function() {
        if (n == 0) stop("stack is empty")
        v <- q[[n]]
        q <<- q[-n]
        n <<- n - 1
        v
    }
    clear <- function() {
        q <<- list()
        n <<- 0
        invisible(self)
    }
    clear()
    self
}

# bench mark based on one in vignette (slightly extended n)
bench_stack <- bench::press(
    n = c(10, 50, 100, 200, 500, 1000),
    bench::mark(
        `base::list_stack_grow` = {
            q <- list_stack()
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `base::vec_stack_pre_allocate` = {
            q <- vec_stack(length = n)
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `base::vec_stack_grow` = {
            q <- vec_stack(length = 2L)
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        `collections::stack` = {
            q <- stack()
            x <- rnorm(n)
            for (i in 1:n) q$push(x[i])
            for (i in 1:n) q$pop()
        },
        check = FALSE
    )
) |> 
    mutate(expression = fct_reorder(
        as.character(expression), median, .fun = mean, .desc = TRUE))
#> Running with:
#>       n
#> 1    10
#> 2    50
#> 3   100
#> 4   200
#> 5   500
#> 6  1000

# plot
bench_stack %>%
    ggplot(aes(x = n, y = median)) +
    geom_line(aes(color = expression)) +
    scale_colour_brewer(palette = "Set2", direction = -1) +
    ggtitle("push and pop n times") + ylab("time")

Created on 2023-01-04 with reprex v2.0.2

@randy3k
Copy link
Owner

randy3k commented Jan 4, 2023

The vector based stack works so much better now by not making copies with each append (something in 3.4.x?)
If you really want to use it in practice, just make sure you trim the vector length when it is shortened enough, say the length is less than 0.5 * the preallocated length.

Feel free to submit the change for the vignette, I will render the site once the update is merged.

@TimTaylor
Copy link
Author

Cool. Will try and PR something over the next couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants