-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state #16763
Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state #16763
Conversation
…Service is not in sync with cluster-state Signed-off-by: Pranshu Shukla <[email protected]>
❌ Gradle check result for 5c9d397: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <[email protected]>
❌ Gradle check result for b884cfc: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <[email protected]>
❌ Gradle check result for 6ea3fd6: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <[email protected]>
❌ Gradle check result for a0c1abd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <[email protected]>
❕ Gradle check result for ed9f5e7: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16763 +/- ##
============================================
+ Coverage 72.05% 72.13% +0.08%
- Complexity 65183 65240 +57
============================================
Files 5318 5318
Lines 303993 304004 +11
Branches 43990 43992 +2
============================================
+ Hits 219028 219307 +279
+ Misses 67046 66710 -336
- Partials 17919 17987 +68 ☔ View full report in Codecov by Sentry. |
Flaky - Test Result (1 failure / +1) |
server/src/main/java/org/opensearch/node/remotestore/RemoteStoreNodeService.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/opensearch/discovery/DiscoveryDisruptionIT.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/node/remotestore/RemoteStoreNodeService.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Pranshu Shukla <[email protected]>
Signed-off-by: Pranshu Shukla <[email protected]>
Signed-off-by: Pranshu Shukla <[email protected]>
❌ Gradle check result for dfb56d8: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <[email protected]>
❌ Gradle check result for e89ea10: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16763-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 da6eda776a0c33f75da3645b04218c35d44d3aa7
# Push it to GitHub
git push --set-upstream origin backport/backport-16763-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…Service is not in sync with cluster-state (opensearch-project#16763) * Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state Signed-off-by: Pranshu Shukla <[email protected]>
…en RepositoriesService is not in sync with cluster-state (#16820) * Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state (#16763) Signed-off-by: Pranshu Shukla <[email protected]>
…Service is not in sync with cluster-state (opensearch-project#16763) * Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state Signed-off-by: Pranshu Shukla <[email protected]>
…Service is not in sync with cluster-state (opensearch-project#16763) * Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state Signed-off-by: Pranshu Shukla <[email protected]> Signed-off-by: Mingshi Liu <[email protected]>
…Service is not in sync with cluster-state (opensearch-project#16763) * Skip remote-repositories validations for node-joins when RepositoriesService is not in sync with cluster-state Signed-off-by: Pranshu Shukla <[email protected]>
Description
During node joins, when a new node containing new repository metadata joins the cluster, the cluster-manager attempts to publish the updated cluster state that includes the node and its metadata. While during this update if the publish operation succeeds and the commit fails due to other issues (like network disruption or joining leader in term), it leads to a persistent cycle of NullPointerExceptions which prevents the cluster to become stable. This is because as part of the publish, the last accepted version and cluster state are updated but due to commits not run, the cluster-state appliers are not executed. This results in the repositories service not in sync with the repositories metadata in the cluster state. Now when the current cluster-manager (leader) steps down and another cluster-manager is elected:
Related Issues
Resolves #16762
Check List
[ ] API changes companion pull request created, if applicable.[] Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.