Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DT-937 - Remove nested threadpool #1852

Merged
merged 4 commits into from
Nov 13, 2024
Merged

Conversation

snf2ye
Copy link
Contributor

@snf2ye snf2ye commented Nov 13, 2024

Jira ticket: https://broadworkbench.atlassian.net/browse/DT-937

Addresses

We were continuously running into instances where Azure snapshot create flights would get stuck. After adding some logging in previous PRs, we were able to pinpoint the issue.

For large requests, we are running out of threads in a nested, shared threadpool. The parent call would hand out all of the threads and then the nested call wouldn't have any more threads to hand out. It was stuck forever waiting for a new thread.

    List<Future<Void>> futures = new ArrayList<>();
    for (List<String> fileIdsBatch :
        ListUtils.partition(List.copyOf(fileIds), MAX_FILTER_CLAUSES)) {
      futures.add(
          azureTableThreadpool.submit(
              () -> {
                // Do stuff
                /// ... 
                // Then, in batchStoreDirectoryEntry we try to pull from the azureTableTreadpool again, 
                // but all of the threads have already been handed out 
                // So, we just wait forever for more threads to become available 
                batchStoreDirectoryEntry(snapshotTableServiceClient, snapshotId, snapshotEntries);
               
                return null;
              }));
    }
    FutureUtils.waitFor(futures);

Summary of changes

Remove nested thread pool. Note: I expect that this means we'll see Azure Snapshot Create Jobs take longer.

Testing Strategy

  • Manually decrease the azure.threading.numTableThreads setting locally and reproduce issue with the bio.terra.service.filedata.azure.tables.TableDaoConnectedTest#testComputeSnapshot connected test.
  • With changes, this connected test passes with limited threads and with the normal, higher setting.

@snf2ye snf2ye requested a review from a team as a code owner November 13, 2024 16:52
@snf2ye snf2ye requested review from rushtong and fboulnois and removed request for a team November 13, 2024 16:52
Copy link
Contributor

@rushtong rushtong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable 👍🏽

Copy link
Contributor

@fboulnois fboulnois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Should we unit test the storeDirectoryEntries method?

@snf2ye snf2ye enabled auto-merge (squash) November 13, 2024 20:03
@snf2ye snf2ye merged commit e8e8f19 into develop Nov 13, 2024
13 checks passed
@snf2ye snf2ye deleted the sh/DT-937-remove-nested-thread-pool branch November 13, 2024 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants