Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tbb::global_control does not stop spinning threads #1417

Closed
Sedeniono opened this issue Jun 24, 2024 · 1 comment · Fixed by #1418
Closed

tbb::global_control does not stop spinning threads #1417

Sedeniono opened this issue Jun 24, 2024 · 1 comment · Fixed by #1418
Labels

Comments

@Sedeniono
Copy link

Sedeniono commented Jun 24, 2024

Summary

Reducing the max. allowed parallelism via tbb::global_control does not "stop" threads anymore. The worker threads are spinning in thread_dispatcher::process() and are eating CPU power, meaning that the execution is slower.

The problem gets triggered when executing first e.g. tbb::parallel_for_each (to "fire up" threads), then reducing the number of threads to 1 via tbb::global_control and then executing a second tbb::parallel_for_each. While the second tbb::parallel_for_each is executed, only a single core should be active. Using the debugger I also see that the business logic is executed only by a single thread. However, looking in the task manager, I see that all cores are busy. The unnecessary threads cause the code to run slower by ~50% (either because of oversubscription, or because the CPU frequency gets reduced, not sure).

Version

Version 2021.10.0 was ok. Version 2021.11.0 is broken. The current master (55bf2b3) is still broken.

I was able to bisect the problem to commit c456844 (pull request: #758).

Environment

  • Windows 11
  • Intel Core i9-13900
  • Microsoft Compiler version 19.40 (_MSC_FULL_VER=194033811)

Observed Behavior

100% CPU load (all cores are busy) even though tbb::global_control was used to reduce the max. allowed parallelism to 1.

Expected Behavior

After setting the max. allowed parallelism to 1 via tbb::global_control, only a single core should be busy (corresponds to ~3% CPU load in the task manager because of 32 virtual cores of my Intel i9-13900).

Steps To Reproduce

  • Code:
#include <chrono>
#include <iostream>
#include <numeric>
#include <tbb/global_control.h>
#include <tbb/parallel_for_each.h>
#include <tbb/version.h>
#include <vector>

int main()
{
  std::cout << "Start. TBB: " << TBB_VERSION_STRING << ", TBB_runtime_version=" << TBB_runtime_version()
            << ", TBB_runtime_interface_version=" << TBB_runtime_interface_version() << ", MSVC: " << _MSC_FULL_VER
            << std::endl;

  //------------------------------------------------
  // First call of tbb::parallel_for_each() to 'create' threads
  static constexpr bool TRIGGER_BUG = true;
  if (TRIGGER_BUG) {
    std::cout << "Warmup to trigger bug" << std::endl;
    std::vector<double> args(1024, 42.0);
    tbb::parallel_for_each(args, [](double & arg) {});
    std::cout << "Warmup finished." << std::endl;
  }

  //------------------------------------------------
  // Reduce number of threads
  std::cout << "Running test" << std::endl;
  static constexpr size_t NUM_CORES = 1;
  tbb::global_control tbbControl(tbb::global_control::max_allowed_parallelism, NUM_CORES);

  //------------------------------------------------
  // Second call of tbb::parallel_for_each()
  std::vector<double> args(1024, 42.0);
  auto const startTime = std::chrono::high_resolution_clock::now();
  tbb::parallel_for_each(args, [](double & arg) {
    for (size_t i = 0; i < 1000000; ++i) {
      arg += std::sin(arg);
    }
  });

  double const elapsed
      = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime)
            .count()
        / 1000.0;

  double const result = std::accumulate(args.begin(), args.end(), 0.0);
  std::cout << "Used " << NUM_CORES << " cores. Finished in " << elapsed << "s: " << result << std::endl;
}
  • Built using cmake:
cmake -DCMAKE_INSTALL_PREFIX="path\to\install\dir" -DTBB_TEST=OFF ..
cmake --build .
cmake --install .
  • Execute and observe in the task manager that all cores are busy. (That is the problem). Also take note of the execution time.
  • Then either revert to an older TBB version, or set TRIGGER_BUG=false. Then run again. Result: Only a single core is busy, and the execution time dropped by ~50%.
@Sedeniono Sedeniono added the bug label Jun 24, 2024
@pavelkumbrasev
Copy link
Contributor

Hi @Sedeniono, that you for the report. I'm actually surprised this bug was not reported sooner :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants