Skip to content

Commit

Permalink
Technical Advisory 104309
Browse files Browse the repository at this point in the history
  • Loading branch information
mdlinville committed Feb 20, 2024
1 parent fdb32a2 commit 4206971
Showing 1 changed file with 87 additions and 0 deletions.
87 changes: 87 additions & 0 deletions src/current/advisories/a104309.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: Technical Advisory a104309
advisory: A-a104309
summary: In rare cases, a rangefeed bug may cause a checkpoint to be emitted prematurely, before all writes below the checkpoint timestamp have been emitted.
toc: true
affected_versions: v2.1.11 to v22.2.17, v23.1.0 to v23.1.14, v23.2.0
advisory_date: 2024-02-20
docs_area: releases
---

Publication date: {{ page.advisory_date | date: "%B %e, %Y" }}

## Description

A [rangefeed](https://www.cockroachlabs.com/docs/stable/create-and-configure-changefeeds#enable-rangefeeds) bug may cause a [checkpoint](https://www.cockroachlabs.com/docs/stable/how-does-an-enterprise-changefeed-work) to be emitted prematurely in rare cases, before all writes below the checkpoint timestamp have been emitted. This could occur if nodes are overloaded or if long-running transactions (several seconds) are involved, such that intent resolution only replicates to a follower running a rangefeed more than 10 seconds after the transaction began. If a cluster using changefeeds experiences this bug, changefeeds will omit these write events, and the following error will be logged:

{% include_cached copy-clipboard.html %}
~~~ none
cdc ux violation: detected timestamp ... that is less or equal to the local frontier
~~~

Across a large sample of clusters using changefeeds over a month, 3.8% of clusters had experienced this bug, and a total 0.000064% of events were omitted (one in a million).

The bug is caused by a race condition that is typically triggered by transactions that run for several seconds or by follower replicas that lag significantly behind the leaseholder, such as when a node is overloaded or a network is disrupted. The rangefeed must be running on a follower replica that applies intent resolution long after the original transaction has committed and garbage-collected its intents and transaction record on the leaseholders (at least 10 seconds after the transaction begins).

The specific conditions to trigger the bug are:

1. An explicit or cross-range transaction commits, then asynchronously resolves all of its intents and removes its transaction record on all relevant leaseholders.

1. A follower replica running a rangefeed replicates the transaction's intent writes, but not yet the resolution of the intents. More than 10 seconds must elapse between when the transaction begins and when the intent resolution is replicated to the follower. (The intent resolution must be replicated to the follower after 10 seconds. That means that the sum of `transaction duration + intent resolution duration + transaction record GC + replication to follower` must be greater than 10 seconds to trigger the bug.)

1. The follower replica's closed timestamp has advanced beyond the transaction's write timestamp. More time than the [closed timestamp target duration](https://www.cockroachlabs.com/docs/stable/advanced-changefeed-configuration#kv-closed_timestamp-target_duration) (3 seconds by default) must elapse from the transaction's write timestamp until the intent resolution is replicated to the follower, and all other writes up to the write timestamp must have been replicated to the follower.

1. If the above conditions are satisfied, the follower attempts to push the transaction to advance its resolved timestamp and emit a checkpoint, but the follower does not find a transaction record. The follower operates as though the transaction was aborted, and allows its resolved timestamp to advance above the transaction's write timestamp, emitting a checkpoint even though the intents have not yet been emitted.

1. When the intent resolution is finally replicated to the follower, it emits the write events with timestamps below the previous checkpoint.

1. The changefeed processor detects these events below the previous checkpoint and discards them instead of emitting them.

All CockroachDB versions prior to and including the following are impacted by this bug:

- v2.1.11 - v22.2.17
- v23.1.0 - v23.1.14
- V23.2.0

Versions prior to 22.2 are no longer eligible for [maintenance support](https://www.cockroachlabs.com/docs/releases/release-support-policy), and this bug will not be fixed in those versions.

## Statement

This is resolved in CockroachDB by [PR #117612](https://github.com/cockroachdb/cockroach/pull/117612), which uses a barrier command to ensure that all historical and ongoing range writes have been applied to the local replica and emitted before the resolved timestamp is advanced and a checkpoint is emitted.

The fix has been applied to maintenance releases of CockroachDB:

- [v22.2.18](https://www.cockroachlabs.com/docs/releases/v22.2#v22-2-18)
- [v23.1.15](https://www.cockroachlabs.com/docs/releases/v23.1#v23-1-15)
- [v23.2.1](https://www.cockroachlabs.com/docs/releases/v23.2#v23-2-1)

This public issue is tracked by [Issue #104309](https://github.com/cockroachdb/cockroach/issues/104309).

## Mitigation

Users are encouraged to upgrade to v22.2.18, v23.1.15, v23.2.1, or a later version that includes the fix.

The log message `cdc ux violation: detected timestamp ... that is less or equal to the local frontier` indicates that a cluster has been affected by this bug. One message will be logged per omitted event. The message will include the [job ID](https://www.cockroachlabs.com/docs/v23.2/show-jobs#show-changefeed-jobs) of the affected changefeed (as `job=...`), as well as the MVCC timestamp of the transaction whose writes were omitted (`detected timestamp ...`). If the data is still present in a table, the following query may allow you to retrieve it:

{% include_cached copy-clipboard.html %}
~~~ sql
SELECT * FROM table WHERE crdb_internal_mvcc_timestamp = {timestamp}
~~~

Replace '{timestamp}` with the timestamp expressed in nanoseconds with the logical component after the decimal point. For example, to query for the logged timestamp `1707854079.38430848,2`:

1. Multiply `1707854079` by <code>1 × 10<sup>9</sup></code> (1 billion).
1. Add `38430848` to the result.
1. Append the portion of the timestamp (`2` in this example) after the comma to the right of the the decimal point. In this example, the converted timestamp is `1707854079038430848.2`.

If data is found, it can be re-emitted in two ways:

- By issuing an `UPDATE` command with the same values.
- By restarting the changefeed with `initial_scan` set to `yes` and without a cursor timestamp. This will perform a full export of all current data across the changefeed.
- By restarting the changefeed and providing a cursor timestamp. If a cursor timestamp is provided, an initial scan will not be performed. Instead, updates above the cursor timestamp will be emitted. The cursor timestamp can be initialized below the timestamp of omitted writes (refer to the reference documentation for [CREATE CHANGEFEED](https://www.cockroachlabs.com/docs/stable/create-changefeed)). However, specifying a cursor timestamp farther in the past than the [`gc.ttlseconds`](https://www.cockroachlabs.com/docs/stable/configure-replication-zones#gc-ttlseconds) replication zone variable, which defaults to `14400` seconds (4 hours), results in an error.

## Impact

[Changefeeds](https://www.cockroachlabs.com/docs/stable/change-data-capture-overview) may incorrectly omit events in rare cases and log an error.

Questions about any technical alert can be directed to our [support team](https://support.cockroachlabs.com/).

0 comments on commit 4206971

Please sign in to comment.