Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC-11431][DOC-10454] Document admission control for snapshot ingestion #19068

Merged
merged 22 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
176f8cb
[DOC-11431] Document admission control for snapshot ingestion
mdlinville Oct 30, 2024
35527a1
Merge branch 'main' into DOC-11431
mdlinville Oct 30, 2024
daf1b16
Merge branch 'main' into DOC-11431
mdlinville Oct 31, 2024
942408d
Merge branch 'main' into DOC-11431
mdlinville Oct 31, 2024
7941bd2
Merge branch 'main' into DOC-11431
mdlinville Nov 1, 2024
8a8a334
Merge branch 'main' into DOC-11431
mdlinville Nov 4, 2024
6658c5f
Merge branch 'main' into DOC-11431
mdlinville Nov 5, 2024
4d3f899
Update src/current/v24.3/admission-control.md
mdlinville Nov 6, 2024
6e37570
Merge branch 'main' into DOC-11431
mdlinville Nov 6, 2024
e4a92bc
Merge branch 'main' into DOC-11431
mdlinville Nov 6, 2024
bbc9db1
Merge branch 'main' into DOC-11431
mdlinville Nov 6, 2024
ec23e18
Address feedback
mdlinville Nov 6, 2024
ca87957
Merge branch 'main' into DOC-11431
mdlinville Nov 6, 2024
56c1b34
Merge branch 'main' into DOC-11431
mdlinville Nov 6, 2024
4251d43
Apply suggestions from code review
mdlinville Nov 7, 2024
f38d82d
Merge branch 'main' into DOC-11431
mdlinville Nov 7, 2024
dfff1d3
Merge branch 'main' into DOC-11431
mdlinville Nov 8, 2024
fe0f61d
Merge branch 'main' into DOC-11431
mdlinville Nov 8, 2024
5430b5f
Merge branch 'main' into DOC-11431
mdlinville Nov 11, 2024
02980b4
Merge remote-tracking branch 'origin/main' into DOC-11431
mdlinville Nov 12, 2024
4c4e37c
Rich's feedback
mdlinville Nov 12, 2024
8916ae9
Merge remote-tracking branch 'origin/DOC-11431' into DOC-11431
mdlinville Nov 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions src/current/v24.3/admission-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ Almost all database operations that use CPU or perform storage IO are controlled

- [General SQL queries]({% link {{ page.version.version }}/selection-queries.md %}) have their CPU usage subject to admission control, as well as storage IO for writes to [leaseholder replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases).
- [Bulk data imports]({% link {{ page.version.version }}/import-into.md %}).
- [Backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}).
- [Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}), including index and column backfills (on both the [leaseholder replica]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) and [follower replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft)).
- [`COPY`]({% link {{ page.version.version }}/copy-from.md %}) statements.
- [Deletes]({% link {{ page.version.version }}/delete-data.md %}) (including deletes initiated by [row-level TTL jobs]({% link {{ page.version.version }}/row-level-ttl.md %}); the [selection queries]({% link {{ page.version.version }}/selection-queries.md %}) performed by TTL jobs are also subject to CPU admission control).
- [Backups]({% link {{ page.version.version }}/backup-and-restore-overview.md %}).
mdlinville marked this conversation as resolved.
Show resolved Hide resolved
- [Schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}), including index and column backfills (on both the [leaseholder replica]({% link {{ page.version.version }}/architecture/replication-layer.md %}#leases) and [follower replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft)).
- [Follower replication work]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft).
- [Raft log entries being written to disk]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft).
- [Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}).
Expand All @@ -68,6 +68,8 @@ Admission control is enabled by default. To enable or disable admission control,
- `admission.kv.enabled` for work performed by the [KV layer]({% link {{ page.version.version }}/architecture/distribution-layer.md %}).
- `admission.sql_kv_response.enabled` for work performed in the SQL layer when receiving [KV responses]({% link {{ page.version.version }}/architecture/distribution-layer.md %}).
- `admission.sql_sql_response.enabled` for work performed in the SQL layer when receiving [DistSQL responses]({% link {{ page.version.version }}/architecture/sql-layer.md %}#distsql).
- {% include_cached new-in.html version="v24.3" %} `kvadmission.store.snapshot_ingest_bandwidth_control.enabled` to optionally limit the disk impact of ingesting snapshots on a node. This cluster setting is in [Preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}#features-in-preview).
- `kvadmission.store.provisioned_bandwidth` to optionally limit the bandwidth for a store, expressed in bytes per second. This cluster setting is in [Preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}#features-in-preview).
mdlinville marked this conversation as resolved.
Show resolved Hide resolved

When you enable or disable admission control settings for one layer, Cockroach Labs recommends that you enable or disable them for **all layers**.

Expand Down Expand Up @@ -134,7 +136,7 @@ COMMIT;

## Considerations

[Client connections]({% link {{ page.version.version }}/connection-parameters.md %}) are not managed by the admission control subsystem. Too many connections per [gateway node]({% link {{ page.version.version }}/architecture/sql-layer.md %}#gateway-node) can also lead to cluster overload.
[Client connections]({% link {{ page.version.version }}/connection-parameters.md %}) are not managed by the admission control subsystem. Too many connections per [gateway node]({% link {{ page.version.version }}/architecture/sql-layer.md %}#gateway-node) can also lead to cluster overload.

{% include {{page.version.version}}/sql/server-side-connection-limit.md %}

Expand Down
6 changes: 4 additions & 2 deletions src/current/v24.3/architecture/replication-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,13 @@ Non-voting replicas can be configured via [zone configurations through `num_vote

##### Overview

When individual [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range) become temporarily unavailable, requests to those ranges are refused by a per-replica "circuit breaker" mechanism instead of hanging indefinitely.
When individual [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range) become temporarily unavailable, requests to those ranges are refused by a per-replica "circuit breaker" mechanism instead of hanging indefinitely.

From a user's perspective, this means that if a [SQL query]({% link {{ page.version.version }}/architecture/sql-layer.md %}) is going to ultimately fail due to accessing a temporarily unavailable range, a [replica]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-replica) in that range will trip its circuit breaker (after 60 seconds [by default](#per-replica-circuit-breaker-timeout)) and bubble a `ReplicaUnavailableError` error back up through the system to inform the user why their query did not succeed. These (hopefully transient) errors are also signalled as events in the DB Console's [Replication Dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}) and as "circuit breaker errors" in its [**Problem Ranges** and **Range Status** pages]({% link {{ page.version.version }}/ui-debug-pages.md %}). Meanwhile, CockroachDB continues asynchronously probing the range's availability. If the replica becomes available again, the breaker is reset so that it can go back to serving requests normally.

This feature is designed to increase the availability of your CockroachDB clusters by making them more robust to transient errors.

For more information about per-replica circuit breaker events happening on your cluster, see the following pages in the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}):
For more information about per-replica circuit breaker events happening on your cluster, see the following pages in the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}):

- The [**Replication** dashboard]({% link {{ page.version.version }}/ui-replication-dashboard.md %}).
- The [**Advanced Debug** page]({% link {{ page.version.version }}/ui-debug-pages.md %}). From there you can view the **Problem Ranges** page, which lists the range replicas whose circuit breakers were tripped. You can also view the **Range Status** page, which displays the circuit breaker error message for a given range.
Expand Down Expand Up @@ -116,6 +116,8 @@ Sending data locally using delegated snapshots has the following benefits:

Delegated snapshots are managed automatically by the cluster with no need for user involvement.

{% include_cached new-in.html version="v24.3" %}To limit the impact of snapshot ingestion on a node with a [provisioned rate]({% link {{ page.version.version }}/cockroach-start.md %}#store) configured for its store, you can enable [admission control]({% link {{ page.version.version }}/admission-control.md %}) for snapshot transfer, based on disk bandwidth. This allows you to limit the disk impact on foreground workloads on the node. Admission control for snapshot transfers is disabled by default; to enable it, set the [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) `kvadmission.store.snapshot_ingest_bandwidth_control.enabled` to `true`. The historgram [metric]({% link {{ page.version.version }}/metrics.md %}) `admission.wait_durations.snapshot_ingest` allows you to observe the wait times for snapshots that were impacted by admission control.
mdlinville marked this conversation as resolved.
Show resolved Hide resolved

### Leases

A single node in the Raft group acts as the leaseholder, which is the only node that can serve reads or propose writes to the Raft group leader (both actions are received as `BatchRequests` from [`DistSender`]({% link {{ page.version.version }}/architecture/distribution-layer.md %}#distsender)).
Expand Down
8 changes: 8 additions & 0 deletions src/current/v24.3/cockroachdb-feature-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,14 @@ Any feature made available in a phase prior to GA is provided without any warran
**The following features are in preview** and are subject to change. To share feedback and/or issues, contact [Support](https://support.cockroachlabs.com/hc).
{{site.data.alerts.end}}

### Admission control for ingesting snapshots

The [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}) `kvadmission.store.snapshot_ingest_bandwidth_control.enabled`, which allows you to to optionally limit the disk impact of ingesting snapshots on a node, is in Preview.

### Admission control to limit the bandwidth for a store

The [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}), `kvadmission.store.provisioned_bandwidth`, which allows you to optionally limit the bandwidth for a store, expressed in bytes per second, is in [Preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}#features-in-preview).
mdlinville marked this conversation as resolved.
Show resolved Hide resolved

### Usage-based billing metrics

Metering for [usage-based billing]({% link cockroachcloud/costs.md %}) of data transfer, managed backup storage, and changefeeds is now in Preview for all CockroachDB Standard and Advanced clusters through November 2024. You can view your usage in the CockroachDB Cloud Console, where line items with a charge of $0 will be shown for each metric. There will be no usage-based charges associated with these metrics during the preview period. For more information, refer to [CockroachDB Cloud Costs: Usage-based billing metrics in Preview]({% link cockroachcloud/costs.md %}#usage-based-billing-metrics-in-preview) or the [announcement]({% link releases/cloud.md %}#october-1-2024) in the release notes.
Expand Down
Loading