-
I'm working with a pipeline via tar_make_future() and I'd like to be able to set the number of workers at the target level. I saw this SO post, which basically says I shouldn't attempt to do this. However, I have a sequence of targets that run fine with 10-15 workers, then I have one target (mid-pipeline) where I'm loading a large data frame and using any more than ~6 workers for that one crashes the callr processes. All targets after the crash-target run fine again with 10+ workers. I feel like limiting the entire pipeline to 6 workers, is slower than it ought to be. Alternatively, is there a way to control the resources per callr worker? Maybe I could tell targets to use lots of "small" workers for some targets (10 or more) and for other targets it uses fewer "larger" workers? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I'd like this option too - e.g. on Slurm, when some targets use much larger machines than others, it would be better to be able to restrict resource requests at submission (by prioritising the target-level |
Beta Was this translation helpful? Give feedback.
-
For this use case, I recommend running sections of the pipeline by themselves. Sketch:
Either that or you could change the dependency graph with the upstream targets pointing to the data and the downstream targets depending on the data. That would force the expensive library(targets)
tar_script({
list(
tar_target(data, run_data()),
tar_target(beginning, run_beginning()),
tar_target(middle, run_middle(beginning)),
tar_target(end, run_end(middle))
)
})
tar_glimpse() library(targets)
tar_script({
list(
tar_target(data, run_data(beginning)),
tar_target(beginning, run_beginning()),
tar_target(middle, run_middle(beginning, data)),
tar_target(end, run_end(middle, data))
)
})
tar_glimpse() It would not be impossible to implement some kind of weighting scheme in the scheduling algorithm to handle unequal targets more gracefully in the general case, but it would be an obscure feature and a heavy refactor. I don't have plans right now to go in that direction. |
Beta Was this translation helpful? Give feedback.
For this use case, I recommend running sections of the pipeline by themselves. Sketch:
Either that or you could change the dependency graph with the upstream targets pointing to the data and the downstream targets depending on the data. That would force the expensive
data
target to run by itself. For example, you could change this: