Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(HTML) style_tt() is very slow #354

Open
J-Moravec opened this issue Oct 4, 2024 · 6 comments
Open

(HTML) style_tt() is very slow #354

J-Moravec opened this issue Oct 4, 2024 · 6 comments

Comments

@J-Moravec
Copy link
Contributor

I have table with ~ 150 rows and I would like to style some columns.

However, when I tried to style a bit too many rows, the style_tt() function takes a few minutes.

@vincentarelbundock
Copy link
Owner

Which output format?

@J-Moravec
Copy link
Contributor Author

Html. I will produce mre once I got some time

@vincentarelbundock
Copy link
Owner

I think the HTML template could really use some improvements: https://github.com/vincentarelbundock/tinytable/blob/main/inst/templates/bootstrap.html

Say we try something simple like this:

library(tinytable)
tt(head(iris)) |> style_tt(color = "orange") |> save_tt("example.html")

You'll see in the raw HTML that every single cell is styled using a separate line of code, to specify the JS/CSS to apply. I bet that this is incredibly inefficient for large tables.

In other formats like LaTeX and Typst, we collect all style_tt() calls. Then, when we build the final table, we combine and reconcile styles, using the last one if there are conflicts. Finally, we split i and j indices into arrays, by groups of cells which share a style.

So if I apply "orange" to all cells of a table, the Typst file will have only one array of indices, and it will apply the styling code only once to the full array. In HTML, we call JS functions for every cell separately, one after the other. I have no proof, but I suspect this is inefficient.

Roadmap to potential improvement:

  1. Modify style_bootstrap.R to do combination and reconciliation of conflicting styles, like here: https://github.com/vincentarelbundock/tinytable/blob/main/R/style_typst.R#L82
  2. Add reconciliation call on build here: https://github.com/vincentarelbundock/tinytable/blob/main/R/build_tt.R#L113
  3. Modify style_bootstrap.R to insert arrays of i and j indices, instead of one line per cell.
  4. Modify the Javascript functions in the bootstrap template here: https://github.com/vincentarelbundock/tinytable/blob/main/inst/templates/bootstrap.html

This is kind of an ambitious project, but I think it is unavoidable for the future. I hope it will make the HTML faster. It will certainly make the code cleaner.

@J-Moravec
Copy link
Contributor Author

J-Moravec commented Oct 7, 2024

Simulation study (and MRE) to see the cost of styling.

The cost of styling seems to be going quite linearly. Dunno what is the cause of the drop at the end. Only one repetition was made, so it might be inaccurate. parallel was used since for cycle took just so much time.

Ideally, the cost of styling should be relatively constant.

library("tinytable")
library("parallel")

sample.data.frame = function(x, size, replace = FALSE, prob = NULL){
    i = 1:nrow(x)
    x[sample(i, size, replace = replace, prob = prob),]
    }

timeit = function(expr, env = parent.frame()){
    tik = Sys.time()
    eval(expr, envir = env)
    tok = Sys.time()
    difftime(tok, tik, units = "secs")
    }

sizes = seq(5, 500)

times_tt = mclapply(
    sizes,
    \(x){
        d = sample.data.frame(iris, size = x, replace = TRUE)
        timeit({tt(d) |> save_tt("html")})
        },
    mc.cores = 16
    ) |> unlist() |> as.numeric()

times_style = mclapply(
    sizes,
    \(x){
        d = sample.data.frame(iris, size = x, replace = TRUE)
        timeit({tt(d) |> style_tt(j = 1, color = "orange") |> save_tt("html")})
        },
    mc.cores = 16
    ) |> unlist() |> as.numeric()

plot(times_style, type = "l", col = "red", lwd = 3)
lines(times_tt, col = "green", lwd = 3)

style_tt_timing

@vincentarelbundock
Copy link
Owner

lol yeah. we need a refactor.

@J-Moravec J-Moravec changed the title style_tt() is very slow (HTML) style_tt() is very slow Oct 7, 2024
@vincentarelbundock
Copy link
Owner

Here are two versions of a simple table. iris_old.txt is the current output of tinytable. Then, iris_old.txt is an alternative with a javascript function that loops over arrays to style.

I'm not sure this will really speed things up, but it will certainly simplify the code. Unfortunately, it'll be a bit of work to implement so we know.

iris_new.txt
iris_old.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants