Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Fix the insertion of entries in shard entry list to maintain strictly increasing order of unique index #4293

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

RiyaTyagi
Copy link

What this PR does / why we need it:
This PR fixes the logic to insert the entries in shard entry list to maintain the strictly increasing order of entry's unique index added to the list. This will fix the issue of skipping few series IDs while fetching the block metadata for peer bootstrapping.
During peer bootstrapping the dbnode fetches the series metadata from peers via FetchBlocksMetadataRawV2 call. The pagination logic for this call is based on the assumption that entries are added in a strictly increasing order of entry's unique index to the shard list. While debugging the issue of data loss during peer bootstrapping it was found that some of the entries are not added in this order. This causes some of the series Ids to be skipped during peer bootstrapping which in turn results in data loss if a particular series id is skipped by every peer.

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

@CLAassistant
Copy link

CLAassistant commented Sep 2, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RiyaTyagi could you try seeing whether #4263 would actually solve the pagination issue during peer bootstrapping?

Copy link
Collaborator

@prateek prateek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add tests we talked about

  • unit test from Arnav's internal diff
  • integration test replicating the issue

src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
@RiyaTyagi
Copy link
Author

I have benchmarked the two PRs against the current m3db version on a 3 node cluster and here are the results:
benchmark_result

Current Version Memory usage:
Current_Version

Memory usage for the current PR with sorting logic:
Adding_Sort

Memory usage for the other PR with extra entry field:
Extra_Field

src/dbnode/storage/shard.go Outdated Show resolved Hide resolved
@RiyaTyagi RiyaTyagi marked this pull request as ready for review September 20, 2024 06:00
Copy link
Collaborator

@prateek prateek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@prateek
Copy link
Collaborator

prateek commented Sep 23, 2024

@linasm ta for the suggestion, Riya ended up using the solution from the other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants