[help] Advice on highly heterogeneous workflows #1400

connor-duffin · 2024-12-14T07:38:54Z

connor-duffin
Dec 14, 2024

Help

I understand and agree to https://books.ropensci.org/targets/help.html.

Description

Our workflow is highly heterogeneous, ranging from single targets that use lots of memory on their own, to dynamically branched targets that are embarrassingly parallel. I have set up heterogeneous workers for the different parts of the pipeline on their own, but it seems they all run at the same time - in reality I was hoping that this would separate out the running of the pipeline, so that we don't run out of memory on our machine!

As a quickfix I am running something like:

targets::tar_make("highly_parallel_target")  # with lots of branches
targets::tar_make("expensive_target")
targets::tar_make()  # remainder of the targets

I was wondering if there was any advice on dealing with such a pipeline, and, if any, best practices that should be followed.

Answered by wlandau

Dec 17, 2024

Successive tar_make() calls are totally fine. If you prefer a single one, then you might consider restructuring the dependency graph to force certain targets to run after one another. For example, instead of:

library(targets)
list(
  tar_target(name = expensive, command = f()),
  tar_target(name = cheap, command = g())
)

you could consider something like:

library(targets)
list(
  tar_target(name = expensive, command = f()),
  tar_target(
    name = cheap,
    command = {
      expensive
      g()
    }
  )
)

or a different variation of this that doesn't load expensive inside cheap:

library(targets)
list(
  tar_target(name = expensive, command = f()),
  tar_target(
    name = sentinel,

View full answer

wlandau · 2024-12-17T14:43:54Z

wlandau
Dec 17, 2024
Maintainer

Successive tar_make() calls are totally fine. If you prefer a single one, then you might consider restructuring the dependency graph to force certain targets to run after one another. For example, instead of:

library(targets)
list(
  tar_target(name = expensive, command = f()),
  tar_target(name = cheap, command = g())
)

you could consider something like:

library(targets)
list(
  tar_target(name = expensive, command = f()),
  tar_target(
    name = cheap,
    command = {
      expensive
      g()
    }
  )
)

or a different variation of this that doesn't load expensive inside cheap:

library(targets)
list(
  tar_target(name = expensive, command = f()),
  tar_target(
    name = sentinel,
    command = {
      expensive
      NULL
    }
  ),
  tar_target(
    name = cheap,
    command = {
      sentinel
      g()
    }
  )
)

1 reply

connor-duffin Dec 18, 2024
Author

Great - that all makes sense to me and is very helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[help] Advice on highly heterogeneous workflows #1400

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[help] Advice on highly heterogeneous workflows #1400

connor-duffin Dec 14, 2024

Help

Description

Replies: 1 comment · 1 reply

wlandau Dec 17, 2024 Maintainer

connor-duffin Dec 18, 2024 Author

connor-duffin
Dec 14, 2024

Replies: 1 comment 1 reply

wlandau
Dec 17, 2024
Maintainer

connor-duffin Dec 18, 2024
Author