Control/change number of tar_make_future workers mid-pipeline? #699

JakeRuss · 2021-11-12T21:58:30Z

JakeRuss
Nov 12, 2021

I'm working with a pipeline via tar_make_future() and I'd like to be able to set the number of workers at the target level. I saw this SO post, which basically says I shouldn't attempt to do this.

However, I have a sequence of targets that run fine with 10-15 workers, then I have one target (mid-pipeline) where I'm loading a large data frame and using any more than ~6 workers for that one crashes the callr processes. All targets after the crash-target run fine again with 10+ workers. I feel like limiting the entire pipeline to 6 workers, is slower than it ought to be.

Alternatively, is there a way to control the resources per callr worker? Maybe I could tell targets to use lots of "small" workers for some targets (10 or more) and for other targets it uses fewer "larger" workers?

Answered by wlandau

Nov 13, 2021

For this use case, I recommend running sections of the pipeline by themselves. Sketch:

tar_make_future(names = starts_with("before_data"), workers = 15)
tar_make_future(names = data, shortcut = TRUE, workers = 1)
tar_make_future(names = starts_with("after_data"), workers = 15, shortcut = TRUE)

Either that or you could change the dependency graph with the upstream targets pointing to the data and the downstream targets depending on the data. That would force the expensive data target to run by itself. For example, you could change this:

library(targets)
tar_script({
  list(
    tar_target(data, run_data()),
    tar_target(beginning, run_beginning()),
    tar_target(middle, run_middle(begi…

View full answer

stuvet · 2021-11-12T22:13:37Z

stuvet
Nov 12, 2021

I'd like this option too - e.g. on Slurm, when some targets use much larger machines than others, it would be better to be able to restrict resource requests at submission (by prioritising the target-level resources spec) rather than leaving it up to the Slurm controller by defaulting to the tar_make_future workers argument.

0 replies

wlandau · 2021-11-13T03:30:02Z

wlandau
Nov 13, 2021
Maintainer

For this use case, I recommend running sections of the pipeline by themselves. Sketch:

tar_make_future(names = starts_with("before_data"), workers = 15)
tar_make_future(names = data, shortcut = TRUE, workers = 1)
tar_make_future(names = starts_with("after_data"), workers = 15, shortcut = TRUE)

Either that or you could change the dependency graph with the upstream targets pointing to the data and the downstream targets depending on the data. That would force the expensive data target to run by itself. For example, you could change this:

library(targets)
tar_script({
  list(
    tar_target(data, run_data()),
    tar_target(beginning, run_beginning()),
    tar_target(middle, run_middle(beginning)),
    tar_target(end, run_end(middle))     
  )
})
tar_glimpse()

library(targets)
tar_script({
  list(
    tar_target(data, run_data(beginning)),
    tar_target(beginning, run_beginning()),
    tar_target(middle, run_middle(beginning, data)),
    tar_target(end, run_end(middle, data))     
  )
})
tar_glimpse()

It would not be impossible to implement some kind of weighting scheme in the scheduling algorithm to handle unequal targets more gracefully in the general case, but it would be an obscure feature and a heavy refactor. I don't have plans right now to go in that direction.

1 reply

JakeRuss Nov 14, 2021
Author

I get that Will, thanks for the reply and suggestions. Hopefully it will help someone else in this situation.

I was trying unsuccessfully to use tar group on the large data frame, and I think I've realized I just need to break it up into literal smaller data frames. If each worker is only retrieving 1/10th of the data then I think I'll be able to max out the workers for the whole pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control/change number of tar_make_future workers mid-pipeline? #699

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Control/change number of tar_make_future workers mid-pipeline? #699

JakeRuss Nov 12, 2021

Replies: 2 comments · 1 reply

stuvet Nov 12, 2021

wlandau Nov 13, 2021 Maintainer

JakeRuss Nov 14, 2021 Author

JakeRuss
Nov 12, 2021

Replies: 2 comments 1 reply

stuvet
Nov 12, 2021

wlandau
Nov 13, 2021
Maintainer

JakeRuss Nov 14, 2021
Author