-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tbb::task_group thread scaling #313
Comments
I'd slightly refactor the approach with
Also you do not need It uses one |
@alexey-katranov thank you for taking the time to look at this. Unfortunately, although my example properly shows the performance characteristics of our actual application, it does not exhibit the full range of capabilities. In the full application, the equivalent of the |
Thank you for the clarification. We will think how we can improve tasking interfaces to cover such cases. Notify: @aleksei-fedotov |
@Dr15Jones it took awhile but |
As part of transitioning from using the deprecated tbb::task API to tbb::task_group I have been doing performance measurement on our applications. I have found that when using a single tbb::task_group we get highly diminished thread scaling. To illustrate the problem, I created four highly simplified versions of the main processing loop of our applications. The code for the simple applications can be found here: https://github.com/Dr15Jones/tbb_group_scaling. Each application does the same processing but uses TBB in a different way. The differences are
When testing on either an Intel or AMD CPU, the single tbb::task_group was found to either not scale as the number of threads increased or to have extremely weak scaling compared to the other options. The tbb::task using allocate_additional_child_of had the best performance followed closely by the N tbb::task_groups case.
My question is, are there plans to improve the performance when using a single tbb::task_group? If not, is the use of multiple tbb::task_groups working together to share the load on creating tasks a supported use case? Alternatively, could a new API for creating a performant hierarchy of task_groups be developed in order to avoid doing a 'spin' loop over the task_group::wait calls?
The text was updated successfully, but these errors were encountered: