From 7273a832fae2d4512676c41dd3c94cdc46e33149 Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Wed, 20 Mar 2024 07:48:14 -0400 Subject: [PATCH] drop cloud versioned remotes (#5190) --- content/docs/command-reference/exp/push.md | 7 -- content/docs/command-reference/install.md | 9 -- content/docs/command-reference/push.md | 89 +------------------ .../data-management/cloud-versioning.md | 70 ++------------- .../remote-storage/amazon-s3.md | 22 ----- .../remote-storage/azure-blob-storage.md | 20 ----- .../remote-storage/google-cloud-storage.md | 19 ---- .../external-dependencies-and-outputs.md | 3 - 8 files changed, 6 insertions(+), 233 deletions(-) diff --git a/content/docs/command-reference/exp/push.md b/content/docs/command-reference/exp/push.md index 68f5b1e9b1..471084a595 100644 --- a/content/docs/command-reference/exp/push.md +++ b/content/docs/command-reference/exp/push.md @@ -5,13 +5,6 @@ to [remote storage]. [remote storage]: /doc/user-guide/data-management/remote-storage - - -`dvc exp push` is not supported with -[`version_aware` DVC remotes](/doc/user-guide/data-management/cloud-versioning). - - - ## Synopsis ```usage diff --git a/content/docs/command-reference/install.md b/content/docs/command-reference/install.md index 50416432d5..ea1a9e3325 100644 --- a/content/docs/command-reference/install.md +++ b/content/docs/command-reference/install.md @@ -3,15 +3,6 @@ Install Git hooks into the DVC repository to automate certain common actions. - - -Do not use these Git hooks if you are using a -[version-aware remote](/doc/user-guide/data-management/cloud-versioning#version-aware-remotes). -Version-aware remotes require running `dvc push` before `git commit`, which is -not supported by the included hooks. - - - ## Synopsis ```usage diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 914f1ab6a8..e41eaca5f8 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -1,8 +1,7 @@ # push Upload tracked files or directories to [remote storage] based on the current -dvc files files (and update the cloud info in those files if -pushing to a [version-aware] remote). +dvc files files. [remote storage]: /doc/user-guide/data-management/remote-storage @@ -276,89 +275,3 @@ Cache and remote 'r1' are in sync. And running `dvc status --cloud`, DVC verifies that indeed there are no more files to push to remote storage. - -## Example: Version-aware remote for readable storage - -Let's set up a [version-aware] remote, which uses cloud versioning to organize -the remote storage. - -[version-aware]: - /doc/user-guide/data-management/cloud-versioning#version-aware-remotes - -```cli -$ dvc remote add -d versioned_store s3://mybucket -$ dvc remote modify versioned_store version_aware true - -$ dvc push -``` - -> See also `dvc remote add` and `dvc remote modify`. - -Now let's look at what was pushed to the remote. Unlike the [example above], the -version-aware remote looks similar to the data in your workspace and is easy to -read. - -[example above]: #example-what-happens-in-the-cache - -```cli -# Show the current versions. -$ aws s3 ls --recursive s3://mybucket/ - -2023-02-01 15:24:09 1708591 data/prepared/test.tsv -2023-02-01 15:24:10 6728772 data/prepared/train.tsv - -# Show all object versions. -$ aws s3api list-object-versions --bucket mybucket -{ - "Versions": [ - { - "ETag": "\"b656f1a8273d0c541340cb129fd5d5a9\"", - "Size": 1708591, - "StorageClass": "STANDARD", - "Key": "data/prepared/test.tsv", - "VersionId": "T6rFr7NSHkL3v9tGStO7GTwsVaIFl42T", - "IsLatest": true, - "LastModified": "2023-02-01T20:24:09.000Z", - ... - }, - { - "ETag": "\"9ca281786366acca17632c27c5c5cc75\"", - "Size": 6728772, - "StorageClass": "STANDARD", - "Key": "data/prepared/train.tsv", - "VersionId": "XaYsHQHWK219n5MoCRe.Rr7LeNbbder_", - "IsLatest": true, - "LastModified": "2023-02-01T20:24:10.000Z", - ... - } - ] -``` - -With `version_aware` enabled, `dvc push` will also modify dvc files -to capture the version information: - -```cli -... - outs: - - path: data/prepared - hash: md5 - files: - - relpath: test.tsv - md5: b656f1a8273d0c541340cb129fd5d5a9 - size: 1708591 - cloud: - versioned_store: - etag: b656f1a8273d0c541340cb129fd5d5a9 - version_id: T6rFr7NSHkL3v9tGStO7GTwsVaIFl42T - - relpath: train.tsv - md5: 9ca281786366acca17632c27c5c5cc75 - size: 6728772 - cloud: - versioned_store: - etag: 9ca281786366acca17632c27c5c5cc75 - version_id: XaYsHQHWK219n5MoCRe.Rr7LeNbbder_ -... -``` - -Always `dvc push` before `git commit` so that the updated cloud version info is -available in Git. diff --git a/content/docs/user-guide/data-management/cloud-versioning.md b/content/docs/user-guide/data-management/cloud-versioning.md index 2413122a89..e9bf6cb667 100644 --- a/content/docs/user-guide/data-management/cloud-versioning.md +++ b/content/docs/user-guide/data-management/cloud-versioning.md @@ -1,45 +1,15 @@ # Cloud Versioning -When cloud versioning is enabled, DVC will store files in the remote according -to their original directory location and filenames. Different versions of a file -will then be stored as separate versions of the corresponding object in cloud -storage. This is useful for cases where users prefer to retain their original -filenames and directory hierarchy in remote storage (instead of using DVC's -usual -[content-addressable storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) -format). - - - -Note that not all DVC functionality is supported when using cloud versioned -remotes, and using cloud versioning comes with the tradeoff of losing certain -benefits of content-addressable storage. - - - -
- -### Expand for more details on the differences between cloud versioned and content-addressable storage - -`dvc remote` storage normally uses -[content-addressable storage](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) -to organize versioned data. Different versions of files are stored in the remote -according to hash of their data content instead of according to their original -filenames and directory location. This allows DVC to optimize certain remote -storage lookup and data sync operations, and provides data de-duplication at the -file level. However, this comes with the drawback of losing human-readable -filenames without the use of the DVC CLI (`dvc get --show-url`) or API -(`dvc.api.get_url()`). - -When using cloud versioning, DVC does not provide de-duplication, and certain -remote storage performance optimizations will be unavailable. +## Importing versioned data -
+DVC supports importing cloud-versioned data from supported storage providers. +Refer to `dvc import-url` (`--version-aware`) and `dvc update --rev` for more +information. ## Supported storage providers Cloud versioning features are only avaible for certain storage providers. -Currently, it is supported on the following `dvc remote` types: +Currently, it is supported on the following storage types: - [Amazon S3] (requires [S3 Versioning] enabled buckets) - Microsoft [Azure Blob Storage] (requires [Blob versioning] enabled storage @@ -70,33 +40,3 @@ management, see: [azure blob storage]: https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-policy-configure [google cloud storage]: https://cloud.google.com/storage/docs/lifecycle - -## Version-aware remotes - -When the `version_aware` option is enabled on a `dvc remote`: - -- `dvc push` will utilize cloud versioning when storing data in the remote. Data - will retain its original directory structure and filenames, and each version - of a file tracked by DVC will be stored as a new version of the corresponding - object in cloud storage. -- `dvc fetch` and `dvc pull` will download the corresponding version of an - object from cloud storage. - -With `version_aware` enabled, `dvc push` will modify dvc files. -Always `dvc push` before `git commit` so that the updated cloud version info is -available in Git. - - - -Note that when `version_aware` is in use, DVC does not delete current versions -or restore noncurrent versions of objects in cloud storage. So the current -version of an object in cloud storage may not match the version of a file in -your DVC repository. - - - -## Importing versioned data - -DVC supports importing cloud-versioned data from supported storage providers. -Refer to `dvc import-url` (`--version-aware`) and `dvc update --rev` for more -information. diff --git a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md index 9f0dbc7cb5..52de253dd8 100644 --- a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md +++ b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md @@ -36,28 +36,6 @@ The AWS user needs the following permissions: `s3:ListBucket`, `s3:GetObject`, To use [custom auth](#custom-authentication) or further configure your DVC remote, set any supported config param with `dvc remote modify`. -## Cloud versioning - - - -Requires [S3 Versioning] enabled on the bucket and the following AWS user -permissions: `s3:ListBucketVersions`, `s3:GetObjectVersion`, -`s3:DeleteObjectVersion`. - - - -```cli -$ dvc remote modify myremote version_aware true -``` - -`version_aware` (`true` or `false`) enables [cloud versioning] features for this -remote. This lets you explore the bucket files under the same structure you see -in your project directory locally. - -[s3 versioning]: - https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html -[cloud versioning]: /docs/user-guide/data-management/cloud-versioning - ## Custom authentication Use these configuration options if you don't have the AWS CLI setup in your diff --git a/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md b/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md index d675d95f60..073160014a 100644 --- a/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md +++ b/content/docs/user-guide/data-management/remote-storage/azure-blob-storage.md @@ -24,26 +24,6 @@ $ dvc remote add -d myremote azure:/// To set up authentication or other configuration, set any supported config param with `dvc remote modify`. -## Cloud versioning - - - -Requires [Blob versioning] enabled on the storage account and container. - - - -```cli -$ dvc remote modify myremote version_aware true -``` - -`version_aware` (`true` or `false`) enables [cloud versioning] features for this -remote. This lets you explore the bucket files under the same structure you see -in your project directory locally. - -[blob versioning]: - https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview -[cloud versioning]: /docs/user-guide/data-management/cloud-versioning - ## Authentication diff --git a/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md b/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md index 3d092cd1b5..77a665fcbc 100644 --- a/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md +++ b/content/docs/user-guide/data-management/remote-storage/google-cloud-storage.md @@ -36,25 +36,6 @@ service account or other ways to authenticate ([more info]). To use [custom auth](#custom-authentication) or further configure your DVC remote, set any supported config param with `dvc remote modify`. -## Cloud versioning - - - -Requires [Object versioning] enabled on the bucket. - - - -```cli -$ dvc remote modify myremote version_aware true -``` - -`version_aware` (`true` or `false`) enables [cloud versioning] features for this -remote. This lets you explore the bucket files under the same structure you see -in your project directory locally. - -[object versioning]: https://cloud.google.com/storage/docs/object-versioning -[cloud versioning]: /docs/user-guide/data-management/cloud-versioning - ## Custom authentication For [service accounts] (a Google account associated to your GCP project instead diff --git a/content/docs/user-guide/pipelines/external-dependencies-and-outputs.md b/content/docs/user-guide/pipelines/external-dependencies-and-outputs.md index 2fad35c707..a805efc6d9 100644 --- a/content/docs/user-guide/pipelines/external-dependencies-and-outputs.md +++ b/content/docs/user-guide/pipelines/external-dependencies-and-outputs.md @@ -188,9 +188,6 @@ change, but not saved in the cache for Saving external outputs to an external cache has been deprecated in DVC 3.0. -Stay tuned as we work on versioning external outputs using -[cloud versioning](/doc/user-guide/data-management/cloud-versioning). - To define files or directories in an external location as outputs, give