Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sync work across VV upgrades #1913

Merged
merged 4 commits into from
Sep 13, 2024
Merged

Make sync work across VV upgrades #1913

merged 4 commits into from
Sep 13, 2024

Conversation

snej
Copy link
Collaborator

@snej snej commented Nov 17, 2023

The transition from tree-based revids to version vectors is tricky for replication, and the prior approach (adding a fake 'legacy' version when creating the vector) didn't turn out to work.

Instead, now:

  1. when a doc is upgraded in memory to a VectorRecord it keeps its tree-based/legacy revids.
  2. When the doc is saved, it creates a version vector for the local Revision but also remembers the prior legacy revid.
  3. Remote Revisions keep their tree-based IDs until they're replaced by newer VV revisions (pulled or pushed).

As a result, revision histories now consist of a VV and/or legacy revids. The version vector, if present, always comes first. The legacy IDs at the end are what allow the recipient of a revision (an active or passive puller) to match up an incoming version vector with a local legacy revid.

(It's also possible for a rev history sent by the replicator to consist only of legacy IDs, even after the upgrade. This happens if a doc was updated locally before the app upgrade but not pushed until after. It can also happen on a pull for the same reason.)

There's an open question of how long we need to keep the legacy revid of the document. It only needs to be used once per remote, but in a P2P scenario you never know if you might have to sync with some new peer that hasn't upgraded that doc yet. In any case the revid isn't very big (~24 bytes) and it only occurs in pre-existing docs, so it may not be worth removing.

@cbl-bot
Copy link

cbl-bot commented Nov 17, 2023

Code Coverage Results:

Type Percentage
branches 65.9
functions 78.75
instantiations 33.18
lines 77.68
regions 73.25

@snej
Copy link
Collaborator Author

snej commented Nov 20, 2023

The test Multiple Collections Incremental Revisions failed in mode PUSH and PULL:

2023-11-17T18:45:57.6745060Z /Users/runner/work/couchbase-lite-core/couchbase-lite-core/Replicator/tests/ReplicatorCollectionTest.cc:590: FAILED:
2023-11-17T18:45:57.6747260Z   CHECK( _statusReceived.progress.documentCount == expectedDocumentCount )
2023-11-17T18:45:57.6748120Z with expansion:
2023-11-17T18:45:57.6748470Z   19 == 20
2023-11-17T18:45:57.6749250Z with message:
2023-11-17T18:45:57.6749710Z   The replicator is forced to stop after timeout

This appears unrelated to what I've changed; it doesn't look like version vectors were involved.

@jianminzhao
Copy link
Contributor

yes, that test is flaky. It seems to happen when the replicator goes to Idle before all revisions get synced.

@jianminzhao
Copy link
Contributor

"when a doc is upgraded in memory to a VectorRecord" -- I noticed that method "upgradeVersioning" is deleted. When actually a doc is upgraded in memory?

@pasin
Copy link
Collaborator

pasin commented Nov 21, 2023

Reviewing the code changes from looking at the diff here is hard for me to understand all of upgrading and handling both types of revisions. I'm still working on it. As of now, I have a few questions in general:

when a doc is upgraded in memory to a VectorRecord it keeps its tree-based/legacy revids.

Do I understand here that if the doc didn't get upgraded to vv, the vv record will be generated when read (but will not be saved)?

(It's also possible for a rev history sent by the replicator to consist only of legacy IDs, even after the upgrade. This happens if a doc was updated locally before the app upgrade but not pushed until after. It can also happen on a pull for the same reason.)

Do I understand correctly that when pulling a new revision that only contains the rev tree, the doc will be upgraded to the version vector (with the old rev tree kept) at that point including the new doc?

@snej
Copy link
Collaborator Author

snej commented Dec 4, 2023

"when a doc is upgraded in memory to a VectorRecord" -- I noticed that method "upgradeVersioning" is deleted. When actually a doc is upgraded in memory?

Any time it's opened by the current version of LiteCore; the DocumentFactory is now VectorDocumentFactory.

@snej
Copy link
Collaborator Author

snej commented Dec 4, 2023

Do I understand here that if the doc didn't get upgraded to vv, the vv record will be generated when read (but will not be saved)?

Yes.

Do I understand correctly that when pulling a new revision that only contains the rev tree, the doc will be upgraded to the version vector (with the old rev tree kept) at that point including the new doc?

The doc gets saved as a VectorRecord in the new format, but no version vector is generated since the new revision has a 'legacy' revid.

The transition from tree-based revids to version vectors is tricky
for replication, and the prior approach (adding a fake 'legacy'
version when creating the vector) didn't turn out to work.

Instead, when a doc is upgraded in memory to a VectorRecord it keeps
its tree-based/legacy revids. When the doc is saved it creates a
version vector for the local Revision but also remembers the prior
legacy revid. Remote Revisions keep their tree-based IDs until VVs
are pulled from or pushed to those remotes.

Revision histories may now include one or more legacy revids at the
end. These are what allow the recipient of a revision (an active or
passive puller) to match up an incoming version vector with a local
legacy revid.

There's still a question of how long we need to keep the legacy revid
of the document. It only needs to be used once per remote, but in a
P2P scenario you don't know if you might have to sync with some new
peer that hasn't upgraded that doc yet. In any case the revid isn't
very big (~24 bytes) so it may not be worth removing.
Some constants added earlier for VV upgrading and dealing with legacy
revids, which have been superseded by newer techniques, were nuked:

- kDocGetUpgraded / kUpgrade
- kLegacyRevSourceID
- VectorRecord::_versioning
c4doc_getRevisionHistory requires doc be loaded with kDocGetAll.
VectorDoc correctly threw an exception on this.
Also, all this is doing is checking the current rev, so no need to
get the history.
@snej
Copy link
Collaborator Author

snej commented Sep 11, 2024

Please review the last commit, which is a change I had to make to get a unit test pass. I think the code in the test was wrong.

@snej snej marked this pull request as ready for review September 11, 2024 21:46
@jianminzhao jianminzhao merged commit 92fc5cc into master Sep 13, 2024
9 checks passed
@jianminzhao jianminzhao deleted the sync-vv-legacy branch September 13, 2024 16:33
jianminzhao added a commit that referenced this pull request Sep 17, 2024
CBL-5616: Update SG version in docker-compose.yml (#2121)
CBL-6239: Fixed c4doc_getRevisionBody in a 2.x db (#2136)
92fc5cc Make sync work across VV upgrades (#1913)
abe9759 Merge in tag 3.2.0
8009302 Statically link to libstdc++ (#2134)
CBL-6169: DataFile::documentKeys() is not thread-safe (#2132)
CBL-6189: Annotation of thread-safety, filling in the missing ones after the first pass. (#2129)
471e5ec Allow a Timer to destruct itself during its callback (#2124)
b090363 Fixed C4CollectionSpec hash and equality (#2122)
cba409e LogEncoder cleanup (#2130)
cf53ccd Rewritten query translator (#1966)
9f75fab Updated Fleece now that its PR is merged
d6c2091 Update Android NDK even more
47cc9ad Update Jenkins configuration to support C++20
1652943 C++20: use std::ranges, contains() methods, std::size()
f27f8dd Build with C++20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants