Clarity in the documentation about tar_resources_clustermq
#963
Replies: 1 comment 3 replies
-
Right, this is part of what it means for workers to be persistent. A persistent worker is an R process that launches early in the pipeline and stays running until the whole pipeline starts to wind down. A persistent worker usually runs multiple targets during its lifecycle, and it is not possible to precisely predict in advance which targets will be assigned to which workers. https://books.ropensci.org/targets/hpc.html discusses persistent vs transient workers, and I just made some edits in ropensci-books/targets@90401f0 and ec088ff to emphasize the concepts.
All the persistent workers launch at the same time in a single array job. Some may be queued for longer than others, but if the job queue is accommodating enough, then all the workers will start at the same time and thus time out at the same time. But more workers = more parallelization, so the pipeline may be more likely to finish before any timeouts occur.
The difference is that
|
Beta Was this translation helpful? Give feedback.
-
Something that has bit me as I transitioned from the
batchtools
backend to theclustermq
backend is the way that resources are specified. Inbatchtools
, if you set some global options viatar_option_set(resources = tar_resources(...))
, you get per-task resources. For example if you set a default walltime of 1 hour, and then submit 100 jobs, then each job will get a 1-hour walltime, which is as expected, and they will likely succeed.The equivalent for
clustermq
isHere, we are actually setting the walltime of the persistent workers, which means that, even if we have 100 targets to build, all of them must finish within 60 minutes or else the pipeline will get stuck in limbo where it thinks it's running but has no workers. Now, this specific limbo issue is solved by mschubert/clustermq#150, however I think it would be helpful to explain this resources behaviour in the docs, if indeed I am understanding correctly. An interesting point also is the impact of the
workers
argument totar_make_clustermq
. As you increase this, you are more likely to have your pipeline succeed, since you have queued up workers that will take over even if a previous worker times out. This is slightly different tobatchtools
whereworkers
will affect the concurrency of processing, but won't affect whether the pipeline succeeds or not.Also, something that still isn't clear to me with
clustermq
is how the individual target resources affect the pipeline, if the worker resources must be determined upfront. For example, if I have the above configuration, but then define a target like this:What happens in this case?
Beta Was this translation helpful? Give feedback.
All reactions