Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] flaky test failure in AzureBlobContainerRetriesTests and AzureBlobStoreRepositoryTests #2782

Closed
tlfeng opened this issue Apr 6, 2022 · 6 comments · Fixed by #2795
Closed
Labels
bug Something isn't working CI CI related flaky-test Random test failure that succeeds on second run v3.0.0 Issues and PRs related to version 3.0.0

Comments

@tlfeng
Copy link
Collaborator

tlfeng commented Apr 6, 2022

Describe the bug
Comes from #2779 (comment), Not reproducible locally.

Log 4226
Reports 4226

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:test' --tests "org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries" -Dtests.seed=F9C872F637DEBD3E -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Busingen -Druntime.java=17

WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.gradle.api.internal.tasks.testing.worker.TestWorker (file:/home/ubuntu/.gradle/wrapper/dists/gradle-7.4.2-all/9uukhhbclvbegdvsww0j0cr3p/gradle-7.4.2/lib/plugins/gradle-testing-base-7.4.2.jar)
WARNING: Please consider reporting this to the maintainers of org.gradle.api.internal.tasks.testing.worker.TestWorker
WARNING: System::setSecurityManager will be removed in a future release
org.opensearch.repositories.azure.AzureBlobContainerRetriesTests > testReadRangeBlobWithRetries FAILED
    java.lang.NumberFormatException: For input string: "igl"
        at __randomizedtesting.SeedInfo.seed([F9C872F637DEBD3E:2CB29CABB86C50F1]:0)
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Long.parseLong(Long.java:711)
        at java.base/java.lang.Long.parseLong(Long.java:836)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.extractTotalBlobLength(ChunkedDownloadUtils.java:138)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.lambda$downloadFirstChunk$0(ChunkedDownloadUtils.java:57)
REPRODUCE WITH: ./gradlew ':plugins:repository-azure:test' --tests "org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadBlobWithRetries" -Dtests.seed=F9C872F637DEBD3E -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Busingen -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobContainerRetriesTests > testReadBlobWithRetries FAILED
    java.lang.NumberFormatException: For input string: "jlk"
        at __randomizedtesting.SeedInfo.seed([F9C872F637DEBD3E:FBFDA3C25A2AF592]:0)
        at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Long.parseLong(Long.java:711)
        at java.base/java.lang.Long.parseLong(Long.java:836)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.extractTotalBlobLength(ChunkedDownloadUtils.java:138)
        at com.azure.storage.blob.implementation.util.ChunkedDownloadUtils.lambda$downloadFirstChunk$0(ChunkedDownloadUtils.java:57)
REPRODUCE WITH: ./gradlew ':plugins:repository-azure:test' --tests "org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob" -Dtests.seed=F9C872F637DEBD3E -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Europe/Busingen -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobContainerRetriesTests > testWriteLargeBlob FAILED
    java.lang.AssertionError: 
    Expected: an empty collection
         but: <[LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.

Log 4229
Reports 4229

> Task :plugins:repository-azure:internalClusterTest

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles" -Dtests.seed=3F29D4B383D73056 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=el -Dtests.timezone=Europe/Skopje -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests > testSnapshotWithLargeSegmentFiles FAILED
    java.lang.AssertionError: 
    Expected: a value greater than <0>
         but: <0> was equal to <0>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:532)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:528)
> Task :plugins:repository-azure:internalClusterTest

REPRODUCE WITH: ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats" -Dtests.seed=3F29D4B383D73056 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=el -Dtests.timezone=Europe/Skopje -Druntime.java=17

org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests > testRequestStats FAILED
    java.lang.AssertionError: 
    Expected: a value greater than <0>
         but: <0> was equal to <0>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:532)
        at org.opensearch.repositories.blobstore.OpenSearchBlobStoreRepositoryIntegTestCase.assertSuccessfulRestore(OpenSearchBlobStoreRepositoryIntegTestCase.java:528)

The test failure of ./gradlew ':plugins:repository-azure:internalClusterTest' --tests "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles" and "org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats" is more common, and also occurred in #2777 (comment)

To Reproduce
See above

Expected behavior
tests pass.

Plugins
n/a

Screenshots
n/a

Host/Environment (please complete the following information):
n/a

Additional context
Probably related to the azure-storage-blob version upgrade #2526

@tlfeng tlfeng added bug Something isn't working CI CI related flaky-test Random test failure that succeeds on second run v3.0.0 Issues and PRs related to version 3.0.0 labels Apr 6, 2022
@tlfeng tlfeng changed the title [CI] flaky test failure [CI] flaky test failure in AzureBlobContainerRetriesTests and AzureBlobStoreRepositoryTests Apr 6, 2022
@reta
Copy link
Collaborator

reta commented Apr 6, 2022

@tlfeng thanks for the issue, looking at it now

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

@tlfeng thanks for the issue, looking at it now

No problem! So far it only occurs in the CI workflow running result, but can't be reproduced in local machine.. so it can be confusing.

@reta
Copy link
Collaborator

reta commented Apr 6, 2022

I found the cause for test but internalClusterTest failures are very unclear, I would suggest to revert the change #2792, my apologies for that

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

@reta 😳That's so nice you found the cause for failure in test, although it only occurs once so far.
Is the internalClusterTest can be reproduced? Is it validated to be really unstable before reverting the upgrade?

@reta
Copy link
Collaborator

reta commented Apr 6, 2022

Is the internalClusterTest can be reproduced? Is it validated to be really unstable before reverting the upgrade?

Yes, it is reproducible locally (out of many attempts), if you don't mind, let me spend another few hours on it (🤞 I will found the cause), if not - we could go ahead with revert, wdyt?

@tlfeng
Copy link
Collaborator Author

tlfeng commented Apr 6, 2022

Ah, good to know the failure occurs locally. I will follow @dblock or other experts' ideas.
(Frankly speaking, I've started to create new PR base on a custom branch without the commit Update azure-storage-blob to 12.15.0. 😂)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CI CI related flaky-test Random test failure that succeeds on second run v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants