-
Some aspects of this question might be slightly non- How do people handle versioning and distribution of intermediate (and final) results across different platforms? In particular: anything downstream of the top-level inputs (functions and raw data) is potentially problematic for version control because (1) the objects may be large and (2) the objects are often binary, making version control systems less useful/granular (although not useless) [these problems interact, because large binary objects are usually not subject to differential versioning ...] This is not a problem if the workflow is reasonably fast on all platforms, or if all collaborators are using a shared space for their files. If not (in particular, if there are specific results that one would like to cache because they are slow to recompute), is there a recommended way to put them into version control with (By "version control" here I really mean "synchronizable shared file space" - presumably In good old If there is a better forum for discussing this issue, please feel free to let me know. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
For syncing an entire
Another option is to push only Data versioning as a feature is outside the scope of |
Beta Was this translation helpful? Give feedback.
For syncing an entire
_targets/
directory after the pipeline is finished, here are some options that come to mind:aws.s3::s3sync(path = "_targets")
Another option is to push only
_targets/meta/meta
to GitHub and use the target-by-target AWS S3 integration in https://books.ropensci.org/targets/cloud.html for the data in_targets/objects
, but the API/bandwidth costs could add up if you have a lot of targets.Data versioning as a feature is outside the scope of
targets
, but I did redesign the data store to make it more amenabl…