Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified in node bootstrap check #2779

Merged

Conversation

tlfeng
Copy link
Collaborator

@tlfeng tlfeng commented Apr 5, 2022

Description

Problem:
The deprecated setting cluster.initial_master_nodes is not identified in the node bootstrap check.
Although the setting is configured in opensearch.yml, the node bootstrap check still fails with message the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes] must be configured.

Change:

  • Add INITIAL_MASTER_NODES_SETTING back to accepted discovery settings in discoveryIsConfigured(), which used in node bootstrap check.
  • Add unit tests back about the setting cluster.initial_master_nodes to test it's an accepted discovery setting.
  • Make an exception message more accurate to show the configured setting name of the 2 setting.

Reason:
The PR #2463 deprecated the setting cluster.initial_master_nodes, and added alternative setting cluster.initial_cluster_manager_nodes.
The bug is caused by the the setting INITIAL_MASTER_NODES_SETTING was replaced by INITIAL_CLUSTER_MANAGER_NODES_SETTING directly in the code, while they should both exist. The method discoveryIsConfigured() is used in this place.

Background:
Bootstrapping a cluster is required when an OpenSearch cluster starts up for the very first time. In development mode (single node without joining a cluster), with no discovery settings configured, this is automatically performed by the nodes themselves.

Issues Resolved

Resolve #2769

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ot identified during node bootstrap

Signed-off-by: Tianli Feng <[email protected]>
@tlfeng tlfeng requested a review from a team as a code owner April 5, 2022 23:45
@tlfeng tlfeng added bug Something isn't working v2.0.0 Version 2.0.0 distributed framework v3.0.0 Issues and PRs related to version 3.0.0 labels Apr 5, 2022
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 0da1101
Log 4224

Reports 4224

@tlfeng tlfeng added backport 2.x Backport to 2.x branch backport 2.0 Backport to 2.0 branch labels Apr 6, 2022
Signed-off-by: Tianli Feng <[email protected]>
@tlfeng tlfeng force-pushed the initial-master-setting-node-bootstrap branch from 1a54ea3 to 7c49ddf Compare April 6, 2022 00:26
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 1a54ea37f3d9a2ac2130c134fb1398628efc41b3
Log 4226

Reports 4226

@tlfeng tlfeng changed the title Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified during node bootstrap Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified in node bootstrap check Apr 6, 2022
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 7c49ddf
Log 4227

Reports 4227

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 7fe1d4e
Log 4229

Reports 4229

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

In log 4226:

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:test' --tests "org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries" -Dtests.seed=F9C872F637DEBD3E -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Busingen -Druntime.java=17

WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/home/ubuntu/.gradle/wrapper/dists/gradle-7.4.2-all/9uukhhbclvbegdvsww0j0cr3p/gradle-7.4.2/lib/plugins/gradle-testing-base-7.4.2.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release
org.opensearch.repositories.azure.AzureBlobContainerRetriesTests > testReadRangeBlobWithRetries FAILED
    java.lang.NumberFormatException: For input string: "igl"
        at __randomizedtesting.SeedInfo.seed([F9C872F637DEBD3E:2CB29CABB86C50F1]:0)
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Long.parseLong(Long.java:711)
        at java.base/java.lang.Long.parseLong(Long.java:836)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.extractTotalBlobLength(ChunkedDownloadUtils.java:138)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.lambda$downloadFirstChunk$0(ChunkedDownloadUtils.java:57)
REPRODUCE WITH: ./gradlew ':plugins:repository-azure:test' --tests "org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadBlobWithRetries" -Dtests.seed=F9C872F637DEBD3E -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Busingen -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobContainerRetriesTests > testReadBlobWithRetries FAILED
    java.lang.NumberFormatException: For input string: "jlk"
        at __randomizedtesting.SeedInfo.seed([F9C872F637DEBD3E:FBFDA3C25A2AF592]:0)
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Long.parseLong(Long.java:711)
        at java.base/java.lang.Long.parseLong(Long.java:836)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.extractTotalBlobLength(ChunkedDownloadUtils.java:138)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.lambda$downloadFirstChunk$0(ChunkedDownloadUtils.java:57)
REPRODUCE WITH: ./gradlew ':plugins:repository-azure:test' --tests "org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob" -Dtests.seed=F9C872F637DEBD3E -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Busingen -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobContainerRetriesTests > testWriteLargeBlob FAILED
    java.lang.AssertionError: 
    Expected: an empty collection
         but: <[LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.

In log 4229:

> Task :plugins:repository-azure:internalClusterTest

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles" -Dtests.seed=3F29D4B383D73056 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=el -Dtests.timezone=Europe/Skopje -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests > testSnapshotWithLargeSegmentFiles FAILED
    java.lang.AssertionError: 
    Expected: a value greater than <0>
         but: <0> was equal to <0>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:532)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:528)
> Task :plugins:repository-azure:internalClusterTest

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats" -Dtests.seed=3F29D4B383D73056 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=el -Dtests.timezone=Europe/Skopje -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests > testRequestStats FAILED
    java.lang.AssertionError: 
    Expected: a value greater than <0>
         but: <0> was equal to <0>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:532)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:528)

Opened new issue #2782 to track.

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 7fe1d4e
Log 4234

Reports 4234

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

In log 4234:
Same as above

> Task :plugins:repository-azure:internalClusterTest

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles" -Dtests.seed=822244C1A3DDD3CD -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-IQ -Dtests.timezone=Europe/Monaco -Druntime.java=17
REPRODUCE WITH: ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats" -Dtests.seed=822244C1A3DDD3CD -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-IQ -Dtests.timezone=Europe/Monaco -Druntime.java=17

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 7fe1d4e
Log 4238

Reports 4238

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 7fe1d4e
Log 4242

Reports 4242

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this fix, @tlfeng can you find other reviews to help merge this blocking change promptly?

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough for now, but I worry that this "setting exists?" may not work correctly in a distributed scenario. Thus this whole business of "what's the name of the actual setting, the new INITIAL_CLUSTER_MANAGER_NODES_SETTING vs. the deprecated INITIAL_MASTER_NODES_SETTING should be implemented as a method on Setting itself, e.g. setting.getActualNameOfSettingInConfigurationOrSomethingLikeThat(), and return the name of the setting used as fallback in the case that it has a fallback, and its own name otherwise. Possible?

@dblock dblock merged commit dd24e17 into opensearch-project:main Apr 6, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 6, 2022
…ot identified in node bootstrap check (#2779)

* Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified during node bootstrap

Signed-off-by: Tianli Feng <[email protected]>

* Restore a variable name

Signed-off-by: Tianli Feng <[email protected]>
(cherry picked from commit dd24e17)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 6, 2022
…ot identified in node bootstrap check (#2779)

* Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified during node bootstrap

Signed-off-by: Tianli Feng <[email protected]>

* Restore a variable name

Signed-off-by: Tianli Feng <[email protected]>
(cherry picked from commit dd24e17)
peternied added a commit to peternied/security that referenced this pull request Apr 6, 2022
Previously the setting was `cluster.initial_master_nodes` and has since
changed to `cluster.initial_cluster_manager_nodes`.  This was causing
issues due to a bug in the fallback method that has been fixed in
OpenSearch, but getting ahead of this change before its needed.

See related opensearch-project/OpenSearch#2779

Signed-off-by: Peter Nied <[email protected]>
peternied added a commit to peternied/security that referenced this pull request Apr 6, 2022
Previously the setting was `cluster.initial_master_nodes` and has since
changed to `cluster.initial_cluster_manager_nodes`.  This was causing
issues due to a bug in the fallback method that has been fixed in
OpenSearch, but getting ahead of this change before its needed.

See related opensearch-project/OpenSearch#2779

Signed-off-by: Peter Nied <[email protected]>
dblock pushed a commit that referenced this pull request Apr 6, 2022
…ot identified in node bootstrap check (#2779) (#2794)

* Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified during node bootstrap

Signed-off-by: Tianli Feng <[email protected]>

* Restore a variable name

Signed-off-by: Tianli Feng <[email protected]>
(cherry picked from commit dd24e17)

Co-authored-by: Tianli Feng <[email protected]>
dblock pushed a commit that referenced this pull request Apr 6, 2022
…ot identified in node bootstrap check (#2779) (#2793)

* Fix issue that deprecated setting 'cluster.initial_master_nodes' is not identified during node bootstrap

Signed-off-by: Tianli Feng <[email protected]>

* Restore a variable name

Signed-off-by: Tianli Feng <[email protected]>
(cherry picked from commit dd24e17)

Co-authored-by: Tianli Feng <[email protected]>
@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

Hi @dblock Thanks for your suggestion! I will take it into account, and will discuss with the team to find a solution to prevent such issue happens in other recent deprecated settings.

@tlfeng tlfeng deleted the initial-master-setting-node-bootstrap branch April 6, 2022 21:09
@dblock
Copy link
Member

dblock commented Apr 6, 2022

Hi @dblock Thanks for your suggestion! I will take it into account, and will discuss with the team to find a solution to prevent such issue happens in other recent deprecated settings.

To be clear, I am suggesting the following:

current

String initialClusterManagerSettingName = INITIAL_CLUSTER_MANAGER_NODES_SETTING.exists(settings)
            ? INITIAL_CLUSTER_MANAGER_NODES_SETTING.getKey()
            : INITIAL_MASTER_NODES_SETTING.getKey();

new

String initialClusterManagerSettingName = INITIAL_CLUSTER_MANAGER_NODES_SETTING.getName(settings)

peternied added a commit to peternied/security that referenced this pull request Apr 8, 2022
Previously the setting was `cluster.initial_master_nodes` and has since
changed to `cluster.initial_cluster_manager_nodes`.  This was causing
issues due to a bug in the fallback method that has been fixed in
OpenSearch, but getting ahead of this change before its needed.

See related opensearch-project/OpenSearch#2779

Signed-off-by: Peter Nied <[email protected]>
peternied added a commit to opensearch-project/security that referenced this pull request Apr 25, 2022
Previously the setting was `cluster.initial_master_nodes` and has since
changed to `cluster.initial_cluster_manager_nodes`.  This was causing
issues due to a bug in the fallback method that has been fixed in
OpenSearch, but getting ahead of this change before its needed.

See related opensearch-project/OpenSearch#2779

Signed-off-by: Peter Nied <[email protected]>
wuychn pushed a commit to ochprince/security that referenced this pull request Mar 16, 2023
…ect#1746)

Previously the setting was `cluster.initial_master_nodes` and has since
changed to `cluster.initial_cluster_manager_nodes`.  This was causing
issues due to a bug in the fallback method that has been fixed in
OpenSearch, but getting ahead of this change before its needed.

See related opensearch-project/OpenSearch#2779

Signed-off-by: Peter Nied <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.0 Backport to 2.0 branch bug Something isn't working distributed framework v2.0.0 Version 2.0.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
4 participants