Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak for static_partitioner #1404

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pavelkumbrasev
Copy link
Contributor

Description

global_control we limit concurrency that all the internal arenas can share between each other. Therefore, each particular arena doesn't not know its actual concurrency limit but only one you explicitly set during construction or for default arena (one that will be used for simple parallel_for call for example) the concurrency will be a whole machine.
When you start parallel_for with static_partitioner it will create as many internal proxy tasks as normal tasks (proxy tasks are used to assign tasks to specific threads). Proxy tasks have a property they should be executed twice. When execute is called for the proxy task for the first time it will return actual task that it propagated. When execute is called for the second time proxy task can be deleted.
In test case each time we call parallel_for with static_partitioner it will create hardware_concurrency proxy tasks but because global_control is present the concurrency of the default arena will not be fully satisfied and some of the proxy tasks will be called only once so they never be destroyed.

This fix will not help if global_control is set concurrently with parallel algorithm execution. Perhaps, this scenario is less probable.

Fixes # - issue number(s) if exists
#1403

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

@lennoxho

Other information

Signed-off-by: pavelkumbrasev <[email protected]>
@pavelkumbrasev
Copy link
Contributor Author

Potentially, idle workers can drain tasks from free slots.

Comment on lines +95 to +96
unsigned max_threads_in_arena = unsigned(std::min(static_cast<std::size_t>(max_concurrency()),
tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a little bit clumsy, don't you think? Does it make sense to move calculation of maximum_arena_concurrency into some dedicated function that could be reused in get_initial_auto_partitioner_divisor as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. if need to introduce new method for these 2 lines. (we check not arena concurrency min(arena_concurrency, allowed_concurrency) but rather available amount of workers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to move it into something like get_num_possible_workers() to highlight that the solution is based on the immediate value of workers as std::min(...) does not mean much to reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from affinity

@pavelkumbrasev
Copy link
Contributor Author

Potentially, idle workers can drain tasks from free slots.

This will require additional synchronization on mailbox due to concurrent access to it. So while this solution will solve all potential problems it also brings potential performance problems

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:

  1. How does the patch fix memory leaks in case global_control is instantiated in the middle of a parallel algorithm's work?
  2. Write such a test?

Comment on lines +847 to +867
tbb::global_control gbl_ctrl{ tbb::global_control::max_allowed_parallelism, std::size_t(tbb::this_task_arena::max_concurrency() / 2) };

size_t current_memory_usage = 0, previous_memory_usage = 0, stability_counter = 0;
bool no_memory_leak = false;
std::size_t num_iterations = 100;
for (std::size_t i = 0; i < num_iterations; ++i) {
for (std::size_t j = 0; j < 100; ++j) {
tbb::parallel_for(0, 1000, [] (int) {}, tbb::static_partitioner{});
}

current_memory_usage = utils::GetMemoryUsage();
stability_counter = current_memory_usage==previous_memory_usage ? stability_counter + 1 : 0;
// If the amount of used memory has not changed during 5% of executions,
// then we can assume that the check was successful
if (stability_counter > num_iterations / 20) {
no_memory_leak = true;
break;
}
previous_memory_usage = current_memory_usage;
}
REQUIRE_MESSAGE(no_memory_leak, "Seems we get memory leak here.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap it into the loop with gradual decrease of the global_control's limit? E.g.,

std::size_t current_limit = std::size_t(tbb::this_task_arena::max_concurrency());
while (current_limit /= 2) {
    tbb::global_control gc{ tbb::global_control::max_allowed_parallelism, current_limit };

    // iterations loop goes here {
        // repetitions loop goes here {
        // }
    // }
}

Comment on lines +95 to +96
unsigned max_threads_in_arena = unsigned(std::min(static_cast<std::size_t>(max_concurrency()),
tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to move it into something like get_num_possible_workers() to highlight that the solution is based on the immediate value of workers as std::min(...) does not mean much to reader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

static_partitioner + global_control triggers an unbounded memory leak
3 participants