Replies: 6 comments 14 replies
-
Fair enough. My use case is that I might generate a huge target that takes 2 hours to create. Now, I want to re-use that across multiple projects. Maybe even across multiple users. I could just save it, move it to a separate location, and then load it as a separate file in a different project, since the target file is just a serialized version of the object (right?). So, what I'm proposing is simpler than what you're suggesting above -- it would be super convenient to have this ability built-in to targets. You wouldn't allow the ability to generate targets in separate location. So, what about a simplified version of 'remote targets' that isn't a full blown customizable If you allowed targets to 1) read targets from a parent location, and 2) copy them to an external folder from local, that would do it, I think, while avoiding many of the issues above. Say, something like, Then a new function Point 2 is solved I think in this scenario. It's not full-blown sharing, it's... deliberate sharing only. Would |
Beta Was this translation helpful? Give feedback.
-
Thanks @wlandau and @nsheff , that has been a very helpful discussion. (@softloud you should take a look at this.) I have a use case, where I want to run some number of separate computational experiments and also publish a synthesis of the set of experiments. Think of it as being like a thesis, where there are multiple non-trivial experiments which may be written up separately (say, for conference papers), then a separate, non-trivial exercise to write up the experiments as a connected whole (the thesis). Each experiment would be implemented as an independent I envisage the synthesis publication being implemented as a separate Unlike @nsheff 's use case, I am not particularly focussed on sharing large data objects, rather I am focussed on enabling read-only, cross-project data flows to support carving a large super-project into manageable loosely connected sub-projects. I had initially been thinking I might have to dynamically symlink to I will look at unitar for inspiration. For my use case I won't need unitar's priority list of projects to search, because I would always be explicitly referring to a specific project. |
Beta Was this translation helpful? Give feedback.
-
Great work, @nsheff! With some minor extensions, it might also allow some projects to take artifacts from other projects as input in the pipeline. If project B pulls from project A, it would be ideal if project B automatically reruns some targets when the upstream files from project A change. Thinking out loud: you could have one file-tracking target with |
Beta Was this translation helpful? Give feedback.
-
Update: I actually ended up implementing the ability to set the data store to paths other than _targets/ (see #407). The use cases just kept piling up, and RStudio Connect was a big one. I still do not like this feature because we lose one of the guardrails protecting reproducibility, but it is no longer possible to avoid. And I think the _targets.yaml/tar_config_set()/tar_config_get() interface handles this as safely as we can hope. |
Beta Was this translation helpful? Give feedback.
-
FYI: Target Markdown makes it super easy to have different sub-pipelines with different file systems: https://books.ropensci.org/targets/markdown.html. Just set
|
Beta Was this translation helpful? Give feedback.
-
With Target Markdown, |
Beta Was this translation helpful? Give feedback.
-
Update 2021-04-08
I actually ended up implementing the capability to set the data store to paths other than
_targets/
(see #407). The use cases just kept piling up, and RStudio Connect was a big one. I still do not like this feature because we lose one of the guardrails protecting reproducibility, but it is no longer possible to avoid. And I think the_targets.yaml
/tar_config_set()
/tar_config_get()
interface handles this as safely as we can hope.Initial thoughts
A pipeline's data store is always a folder named
_targets/
at the project root. Understandably, some users want to set the path to something other than_targets/
. However, and the perils and limitations would be too egregious, and the benefits would not go far enough._targets/
convention enforces standardization and thus readability, reproducibility, transparency, and maintainability.tar_make()
's could write to it simultaneously, and race conditions would constantly corrupt the output. Sharing requires serious version control, e.g. Git/GitHub for small projects and Git LFS for medium ones. (Maybe DVC versioning for large projects, as mentioned here).tar_read()
runs in the current R session, whereastar_make()
does its work in a reproducible externalcallr
process. Both R sessions need access to the same data store, and ensuring agreement would be awkward and brittle if the store path were custom. I see no good way to do this.tar_make()
andtar_read()
, then the user would need to manually set it every time, and this is too easy to forget..Renviron
file could send the same path to both processes, which gets us closer to a solution. However, there is still room for confusion and careless errors from ad hoc calls toSys.setenv()
. In addition, the.Renviron
approach would not cover Compatibility with Shiny #291, the most compelling use case. Shiny apps would need to overwrite their own.Renviron
files at runtime (depending on user-specific project storage) which will never be possible in production._targets/objects/
just like the dynamic files you declare withtar_target(format = "file")
. In other words, internally, all files are dynamic files relative to the project root. If the store path were custom and you changed this path mid-project, it would invalidate most of your targets. So either (1) custom paths would be disappointing, or (2) the store would need a non-back-compatible redesign and would be more difficult for me to maintain the end.Beta Was this translation helpful? Give feedback.
All reactions