Update the recommendations for cluster settings to improved changefee…

…d related latency (#18290)
cockroachdb · Feb 16, 2024 · dcae9c1 · dcae9c1
1 parent a0dd607
commit dcae9c1
Show file tree

Hide file tree

Showing 4 changed files with 80 additions and 22 deletions.
diff --git a/src/current/v23.1/advanced-changefeed-configuration.md b/src/current/v23.1/advanced-changefeed-configuration.md
@@ -70,39 +70,70 @@ Use the following workflow to enable `MuxRangefeed`:
 
 ### Latency in changefeeds
 
-When you are running large workloads, changefeeds can encounter or cause latency in a cluster in the following ways:
+When you are running large workloads, changefeeds can **encounter** or **cause** latency in a cluster in the following ways:
 
 - Changefeeds can have an impact on SQL latency in the cluster generally.
-- Changefeeds can encounter latency in **events** emitting. This latency is the total time CockroachDB takes to:
+- Changefeeds can encounter latency in events emitting. This latency is the total time CockroachDB takes to:
     - Commit writes to the database.
     - Encode [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}).
     - Deliver the message to the [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}).
 
 The following [cluster settings]({% link {{ page.version.version }}/cluster-settings.md %}) reduce bursts of [rangefeed]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) work so that updates are paced steadily over time.
 
 {{site.data.alerts.callout_danger}}
-We do **not** recommend adjusting these settings unless you are running a large workload, or are working with the Cockroach Labs [support team]({{ link_prefix }}support-resources.html).
+We do **not** recommend adjusting these settings unless you are running a large workload, or are working with the Cockroach Labs [support team]({{ link_prefix }}support-resources.html). Thoroughly test different cluster setting configurations before deploying to production.
 {{site.data.alerts.end}}
 
 #### `kv.closed_timestamp.target_duration`
 
 **Default:** `3s`
 
-This setting controls the frequency of checkpoints for each [range]({% link {{ page.version.version }}/architecture/overview.md %}#range). A changefeed aggregates these checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). As a result, the higher the value of this setting the longer it can take for a changefeed to checkpoint. It is important to note that a changefeed at default configuration does not checkpoint more often than once every 30 seconds.
+{{site.data.alerts.callout_info}}
+Adjusting `kv.closed_timestamp.target_duration` could have a detrimental impact on [follower reads]({% link {{ page.version.version }}/follower-reads.md %}). If you are using follower reads, refer to the [`kv.rangefeed.closed_timestamp_refresh_interval`](#kv-rangefeed-closed_timestamp_refresh_interval) cluster setting instead to ease changefeed impact on foreground SQL latency.
+{{site.data.alerts.end}}
 
-In clusters running large-scale workloads, increasing this setting can help to lower the potential impact of changefeeds on SQL latency. That is, an increase in the setting could lower the load on the cluster. This is important for workloads with tables in the TB range of data. However, for most workloads, we recommend leaving this setting at the default of `3s`.
+`kv.closed_timestamp.target_duration` controls the target [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) lag duration, which determines how far behind the current time CockroachDB will attempt to maintain the closed timestamp. For example, with the default value of `3s`, if the current time is `12:30:00` then CockroachDB will attempt to keep the closed timestamp at `12:29:57` by possibly retrying or aborting ongoing writes that are below this time.
 
-{{site.data.alerts.callout_danger}}
-Thoroughly test any adjustment in cluster settings before deploying the change in production.
-{{site.data.alerts.end}}
+A changefeed aggregates checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). In the context of changefeeds, `kv.closed_timestamp.target_duration` affects how old the checkpoints will be, which will determine the latency before changefeeds can consider the history of an event complete.
+
+#### `kv.rangefeed.closed_timestamp_refresh_interval`
+
+**Default:** `0s`
+
+This setting controls the interval at which [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) updates are delivered to [rangefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) and in turn emitted as a [changefeed checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}).
+
+At the `0s` interval default for this setting, every range with a rangefeed will emit a checkpoint event based on the [`kv.closed_timestamp.side_transport_interval`](#kv-closed_timestamp-side_transport_interval) value (default `200ms`). Increasing the interval value for `kv.rangefeed.closed_timestamp_refresh_interval` will lengthen the delay between each checkpoint, which will increase the latency of changefeed checkpoints, but reduce the impact on SQL latency due to [overload]({% link {{ page.version.version }}/admission-control.md %}#use-cases-for-admission-control) on the cluster.
+
+If you are running changefeeds at a large scale and notice foreground SQL latency, we recommend increasing this setting.
+
+As a result, adjusting `kv.rangefeed.closed_timestamp_refresh_interval` can affect changefeeds encountering latency **and** changefeeds causing foreground SQL latency. In clusters running large-scale workloads, it may be helpful to:
+
+- **Decrease** the value for a lower changefeed emission latency — that is, how often a client can confirm that all relevant events up to a certain timestamp have been emitted.
+- **Increase** the value to reduce the potential impact of changefeeds on SQL latency. This will lower the resource cost of changefeeds, which can be especially important for workloads with tables in the TB range of data.
+
+It is important to note that a changefeed at default configuration does not checkpoint more often than once every 30 seconds. When you create a changefeed with [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}), you can adjust this with the [`min_checkpoint_frequency`]({% link {{ page.version.version }}/create-changefeed.md %}#min-checkpoint-frequency) option.
+
+#### `kv.closed_timestamp.side_transport_interval`
+
+**Default:** `200ms`
+
+The `kv.closed_timestamp.side_transport_interval` cluster setting controls how often the closed timestamp is updated. Although the closed timestamp is updated every `200ms` by default, CockroachDB will emit an event across the rangefeed containing the closed timestamp value as per the cluster's [`kv.rangefeed.closed_timestamp_refresh_interval`](#kv-rangefeed-closed_timestamp_refresh_interval) setting, which by default is `0s`.
+
+`kv.closed_timestamp.side_transport_interval` is helpful when ranges are inactive. The closed timestamp subsystem usually propagates [closed timestamps via Raft commands]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps). However, an idle range that does not see any writes does not receive any Raft commands, so it would stall. This setting is an efficient mechanism to broadcast closed timestamp updates for all idle ranges between nodes.
+
+Adjusting `kv.closed_timestamp.side_transport_interval` will affect both [follower reads]({% link {{ page.version.version }}/follower-reads.md %}) and changefeeds. While you can use `kv.closed_timestamp.side_transport_interval` to tune the checkpointing interval, we recommend `kv.rangefeed.closed_timestamp_refresh_interval` if you are using follower reads.
 
 #### `kv.rangefeed.closed_timestamp_smear_interval`
 
 **Default:** `1ms`
 
 This setting provides a mechanism to pace the [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) notifications to follower replicas. At the default, the closed timestamp smear interval makes rangefeed closed timestamp delivery less spiky, which can reduce its impact on foreground SQL query latency.
 
-For example, if you have a large table, and one of the nodes in the cluster is hosting 6000 ranges from this table. Normally, the rangefeed system will wake up every `kv.closed_timestamp.target_duration` (default `3s`) and every 3 seconds it will publish checkpoints for all 6000 ranges. In this scenario, the `kv.rangefeed.closed_timestamp_smear_interval` setting takes the `3s` frequency and divides it into `1ms` chunks. Instead of publishing checkpoints for all 6000 ranges, it will publish checkpoints for 2 ranges every `1ms`. This produces a more predictable and level load, rather than spiky, large bursts of workload.
+For example, if you have a large table, and one of the nodes in the cluster is hosting 6000 ranges from this table. Normally, the rangefeed system will wake up every [`kv.rangefeed.closed_timestamp_refresh_interval`](#kv-rangefeed-closed_timestamp_refresh_interval) setting and will publish checkpoints for all 6000 ranges. In this scenario, the `kv.rangefeed.closed_timestamp_smear_interval` setting takes the `kv.rangefeed.closed_timestamp_refresh_interval` frequency and divides it into `1ms` chunks. Instead of publishing checkpoints for all 6000 ranges, it will publish checkpoints for the calculated ranges every `1ms`. This produces a more predictable and level load, rather than spiky, large bursts of workload.
+
+{{site.data.alerts.callout_info}}
+The default of [`kv.rangefeed.closed_timestamp_refresh_interval`](#kv-rangefeed-closed_timestamp_refresh_interval) is `0s`, so the event emission will fall back to [`kv.closed_timestamp.side_transport_interval`](#kv-closed_timestamp-side_transport_interval), which updates the closed timestamp (default `200ms`).
+{{site.data.alerts.end}}
 
 ### Lagging ranges
 

diff --git a/src/current/v23.1/architecture/transaction-layer.md b/src/current/v23.1/architecture/transaction-layer.md
@@ -120,7 +120,7 @@ Closed timestamps provide the guarantees that are used to provide support for lo
 The timestamp for any transaction, especially a long-running transaction, could be [pushed](#timestamp-cache). An example is when a transaction encounters a key written at a higher timestamp. When this kind of [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#understanding-and-avoiding-transaction-contention) happens between transactions, the closed timestamp **may** provide some mitigation. Increasing the closed timestamp interval may reduce the likelihood that a long-running transaction's timestamp is pushed and must be retried. Thoroughly test any adjustment to the closed timestamp interval before deploying the change in production, because such an adjustment can have an impact on:
 
 - [Follower reads]({% link {{ page.version.version }}/follower-reads.md %}): Latency or throughput may be increased, because reads are served only by the leaseholder.
-- [Change data capture]({% link {{ page.version.version }}/change-data-capture-overview.md %}): Latency of changefeed messages may be increased.
+- [Change data capture]({% link {{ page.version.version }}/change-data-capture-overview.md %}): Latency of changefeed messages may be increased. For more information on how you can adjust closed timestamp durations, updates, and emission, refer to [Latency in changefeeds]({% link {{ page.version.version }}/advanced-changefeed-configuration.md %}#latency-in-changefeeds).
 - [Statistics collection]({% link {{ page.version.version }}/cost-based-optimizer.md %}#table-statistics): Load placed on the leaseholder may increase during collection.
 
 While increasing the closed timestamp may decrease retryable errors, it may also increase lock latencies. Consider an example where: