Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in concurrent_vector::grow_by() #1532

Merged
merged 4 commits into from
Nov 4, 2024

Conversation

aleksei-fedotov
Copy link
Contributor

Description

When thread tries to grow concurrent_vector, it first creates locally a new extended segmentation table, with which it then tries to update the global one. However, before this patch, the thread simply updated the global table not taking into account that there might be the other threads doing the same thing. This patch makes the thread aware about its possibly concurrent environment.

Fixes #1531

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • updated
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

@npotravkin

Other information

include/oneapi/tbb/concurrent_vector.h Outdated Show resolved Hide resolved
test/tbb/test_concurrent_vector.cpp Outdated Show resolved Hide resolved
Signed-off-by: Fedotov, Aleksei <[email protected]>
Signed-off-by: Fedotov, Aleksei <[email protected]>
Copy link
Contributor

@kboyarinov kboyarinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than this, the patch looks good for me.
I would still wait for feedback from @pavelkumbrasev to ensure we did not miss something.

include/oneapi/tbb/detail/_segment_table.h Show resolved Hide resolved
} else if (new_table) {
// Other thread was the first to replace the segment table. Current thread's
// table is not needed anymore, so destroying it.
destroy_and_deallocate_table(new_table, pointers_per_long_table);
}
}).on_exception([&] {
my_segment_table_allocation_failed.store(true, std::memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this logic work correct while having one thread which allocates the long table and successfully stores it into the my_segment_table and another which allocates the long table and receives a bad_alloc while trying to allocate?
It seems like we need to double check this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what logic we think is correct. Perhaps, not reporting allocation error in case it succeeded by some other thread makes sense, and double checking my_segment_table for updated state first can improve algorithm robustness in some cases. Although, I am not sure how frequent these cases can be. Also, it is not full cover anyway, as successful allocation can still sneak in after that double check. Full cover requires recording the failure state into single source of information such as my_segment_table or some kind of non-trivial synchronization between multiple information sources, but we don't have that kind of logic through the whole vector right now I guess. So, at most we can write something like the following instead:

Suggested change
my_segment_table_allocation_failed.store(true, std::memory_order_relaxed);
// Last chance to overcome the failure, hoping that other thread has succeeded
// extending the table
table = get_table();
if (table == my_embedded_table) {
my_segment_table_allocation_failed.store(true, std::memory_order_relaxed);
}

Does it suffice though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it as-is for now since I believe this happens once in a blue moon and the ideal solution will require a lot of resources now.

Copy link
Contributor

@kboyarinov kboyarinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aleksei-fedotov aleksei-fedotov merged commit 63a2a4d into master Nov 4, 2024
23 of 25 checks passed
@aleksei-fedotov aleksei-fedotov deleted the fix-grow_by-in-concurrent-vector branch November 4, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in tbb::concurrent_vector
3 participants