Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locality-aware backup description edit for node locality #18269

Merged
merged 6 commits into from
Feb 27, 2024

Conversation

kathancox
Copy link
Contributor

@kathancox kathancox commented Feb 1, 2024

Fixes DOC-9612

This PR adds further detail on locality-aware backups and clarifies some behavior, (refer to the links for rendered previews):

Note/question: We do not currently mention l-a backups in our data domiciling docs or multi-region docs, so this PR does not change anything there yet.

Copy link

github-actions bot commented Feb 1, 2024

Files changed:

Copy link

netlify bot commented Feb 1, 2024

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit a40b30b
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/65de2cbe0ce1bb00084e43b1

Copy link

netlify bot commented Feb 1, 2024

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit a40b30b
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/65de2cbee38b7a0008febcd2

Copy link

netlify bot commented Feb 1, 2024

Netlify Preview

Name Link
🔨 Latest commit a40b30b
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/65de2cbe3899ff0008915fc8
😎 Deploy Preview https://deploy-preview-18269--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kathancox kathancox force-pushed the locality-aware-backups branch 3 times, most recently from 8aa7b6e to b75f611 Compare February 2, 2024 15:40
@@ -5,10 +5,10 @@ toc: true
docs_area: manage
---

CockroachDB backups operate as _jobs_, which are potentially long-running operations that could span multiple SQL sessions. Unlike regular SQL statements, which CockroachDB routes to the [optimizer](cost-based-optimizer.html) for processing, a [`BACKUP`](backup.html) statement will move into a job workflow. A backup job has four main phases:
CockroachDB backups operate as _jobs_, which are potentially long-running operations that could span multiple SQL sessions. Unlike regular SQL statements, which CockroachDB routes to the [optimizer](cost-based-optimizer.html) for processing, a [`BACKUP`](backup.html) statement will move into a job workflow. A backup job has four main phases:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: No new content here until LINE 94

@@ -91,6 +91,30 @@ The backup metadata files describe everything a backup contains. That is, all th

With the full backup complete, the specified storage location will contain the backup data and its metadata ready for a potential [restore](restore.html). After subsequent backups of the `movr` database to this storage location, CockroachDB will create a _backup collection_. See [Backup collections](take-full-and-incremental-backups.html#backup-collections) for information on how CockroachDB structures a collection of multiple backups.

## Backup jobs with locality
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v22.2 did not have any technical detail on locality-aware backups, this is taken from v23.1 / v23.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update 22.2? It's EOL, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite... in June it's EOL. I was really playing it safe here updating v22.2 as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it's EOL for me but not for thee. :) Thank you!

@shannonbradshaw
Copy link
Contributor

Thanks, Kathryn. I like the approach you've taken with this PR in terms of clear but subtle warnings that users consider the current behavior carefully if they have data domiciling requirements.

@@ -0,0 +1 @@
A successful locality-aware backup job requires that each node in the cluster has access to each storage location. This is because any node in the cluster can claim the job and become the [_coordinator_ ](backup-architecture.html#job-creation-phase) node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A successful locality-aware backup job requires that each node in the cluster has access to each storage location. This is because any node in the cluster can claim the job and become the [_coordinator_ ](backup-architecture.html#job-creation-phase) node.
A successful locality-aware backup job requires that each node in the cluster has access to each storage location. This is because any node in the cluster can claim the job and become the [_coordinator_](backup-architecture.html#job-creation-phase) node.


Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, [user managed database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) using user provided storage locations are supported.
Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, you can take manual [database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) to your own [cloud storage location](https://www.cockroachlabs.com/docs/{{site.current_cloud_version}}/use-cloud-storage).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand why we don't offer backing up to local storage as an option, but there's nothing that says they have to save the backup to cloud storage, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could conceivably back up to their own storage, provided it's accessible from cloud. Not sure the distinction between that and "your own cloud storage location" is super-sharp? But y'alls department, obviously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I changed this as the language on this page was really stilted. I didn't think about other storage tbh, just automatically went with cloud storage as that is what we recommend.

@kathancox kathancox marked this pull request as ready for review February 5, 2024 20:39
Copy link
Contributor

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New language looks great to me. Some comments. Please wait also for @stevendanna 's review for accuracy - thank you!

@@ -0,0 +1 @@
CockroachDB {{ site.data.products.serverless }} clusters operate with a [different architecture]({% link cockroachcloud/architecture.md %}#cockroachdb-serverless) compared to CockroachDB {{ site.data.products.core }} and CockroachDB {{ site.data.products.dedicated }} clusters. These architectural differences have implications for how locality-aware backups can run. Serverless clusters will scale resources depending on whether they are actively in use, which means that it is less likely to have a SQL pod available in every locality. As a result, Serverless clusters are more likely to have ranges that do not match with any of the cluster's localities, which can lead to more ranges backed up to a storage bucket in a different locality. You should consider this as you plan a backup strategy that must comply with [data domiciling]({% link v23.2/data-domiciling.md %}) requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fit better in Cloud documentation? Does "Serverless" apply to self-hosted at all? (I don't think it does, but this is all subtle. Perhaps Steven will correct me.)

Copy link
Contributor Author

@kathancox kathancox Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have added it to the unsupported features page in this PR. But, right now the information architecture for backup/restore in cloud needs some work (which I am working on this Breather Week). I will add this include to that work when there is a good space on the new Cloud docs I am working on.


Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, [user managed database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) using user provided storage locations are supported.
Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, you can take manual [database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) to your own [cloud storage location](https://www.cockroachlabs.com/docs/{{site.current_cloud_version}}/use-cloud-storage).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could conceivably back up to their own storage, provided it's accessible from cloud. Not sure the distinction between that and "your own cloud storage location" is super-sharp? But y'alls department, obviously.

@@ -91,6 +91,30 @@ The backup metadata files describe everything a backup contains. That is, all th

With the full backup complete, the specified storage location will contain the backup data and its metadata ready for a potential [restore](restore.html). After subsequent backups of the `movr` database to this storage location, CockroachDB will create a _backup collection_. See [Backup collections](take-full-and-incremental-backups.html#backup-collections) for information on how CockroachDB structures a collection of multiple backups.

## Backup jobs with locality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update 22.2? It's EOL, right?

- [Locality-restricted backup execution](#job-coordination-using-the-execution-locality-option): Specify a set of locality filters for a backup job in order to restrict the nodes that can participate in the backup process to that locality. This ensures that the backup job is executed by nodes that meet certain requirements, such as being located in a specific region or having access to a certain storage bucket.

### Job coordination and export of locality-aware backups

When you create a [locality-aware backup]({% link {{ page.version.version }}/take-and-restore-locality-aware-backups.md %}) job, any node in the cluster can [claim the backup job](#job-creation-phase). A successful locality-aware backup job requires that each node in the cluster has access to each storage location. This is because any node in the cluster can claim the job and become the coordinator node. Once each node informs the coordinator node that it has completed exporting the row data, the coordinator will start to write metadata, which involves writing to each locality bucket a partial manifest recording what row data was written to that [storage bucket]({% link {{ page.version.version }}/use-cloud-storage.md %}).

Every node involved in the backup is responsible for backing up the ranges for which it was the [leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) at the time the coordinator planned the [distributed backup flow]({% link {{ page.version.version }}/backup-architecture.md %}#resolution-phase). The locality of the node ([configured at node startup]({% link {{ page.version.version }}/cockroach-start.md %}#locality)) exporting the row data determines where the backups files will be placed in a locality-aware backup.
Every node involved in the backup is responsible for backing up the ranges for which it was the [leaseholder]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) at the time the coordinator planned the [distributed backup flow]({% link {{ page.version.version }}/backup-architecture.md %}#resolution-phase).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in 23.2 at least we can do follower-reads so it is now possible that we send the work to any replica. This is a pretty detailed document already, how deep do we want to go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this one — I would like to open a separate issue. The architecture page has been up for a couple of releases at least now, so there is likely some overhauling to be done here... starting with an audit. Is that OK with you @stevendanna, or do you think we should correct some of these passages in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm OK with that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@florence-crl florence-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending a question and some suggestions


Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, [user managed database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) using user provided storage locations are supported.
Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, you can take manual [database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) to your own [cloud storage location](https://www.cockroachlabs.com/docs/{{site.current_cloud_version}}/use-cloud-storage).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, you can take manual [database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}#back-up-data) to your own [cloud storage location](https://www.cockroachlabs.com/docs/{{site.current_cloud_version}}/use-cloud-storage).
Automated database and table level backups are not supported in CockroachDB {{ site.data.products.serverless }}. However, you can take manual [database and table level backups]({% link cockroachcloud/take-and-restore-customer-owned-backups.md %}?filters=cloud#back-up-data) to your own [cloud storage location](https://www.cockroachlabs.com/docs/{{site.current_cloud_version}}/use-cloud-storage).

Fix link to have ?filters=cloud because default is userfile

src/current/cockroachcloud/use-managed-service-backups.md Outdated Show resolved Hide resolved
src/current/cockroachcloud/use-managed-service-backups.md Outdated Show resolved Hide resolved
@kathancox kathancox merged commit 52770ea into main Feb 27, 2024
6 checks passed
@kathancox kathancox deleted the locality-aware-backups branch February 27, 2024 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants