-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does task_group_context
do?
#1278
Comments
Hi @blonded04, |
I get segfaults on that line :(
Probably something wrong with |
It hard to say what went wrong. Could you provide reproducer? |
Sorry, it took a while, however the smallest reproducer I got problems with My hypothesis is that TBB is not a friend of creating a context inside main.cpp: #include <atomic>
#include <iostream>
// shared_state between tbb and main.cpp
#include <oneapi/tbb/problems.h>
#include <tbb/parallel_for.h>
constexpr unsigned nthread = 12u;
int main() {
// force nthread threads to join arena (they wont leave because wait-limit in task dispatcher is increased)
std::atomic<unsigned> spin_barrier(nthread);
tbb::parallel_for(tbb::blocked_range<int>(0, nthread), [&spin_barrier](tbb::blocked_range<int>) {
spin_barrier.fetch_sub(1, std::memory_order_release);
while (spin_barrier.load(std::memory_order_acquire)) {
asm volatile ("pause\npause\npause\npause");
}
});
std::cout << "parallel_for_finished" << std::endl;
// enable problematic behaviour
tbb::set_flag();
// some time later we will face SEGFAULT
while (tbb::get_counter() < 1000000u) {
std::cout << "\t" << tbb::get_counter() << std::endl;
}
// we won't even get to it
tbb::set_flag(false);
} GDB backtrace for segfault:
Diff in TBB: diff --git a/CMakeLists.txt b/CMakeLists.txt
index 47872941..eaa81b12 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -66,7 +66,7 @@ include(CMakeDependentOption)
# Handle C++ standard version.
if (NOT MSVC) # no need to cover MSVC as it uses C++14 by default.
if (NOT CMAKE_CXX_STANDARD)
- set(CMAKE_CXX_STANDARD 11)
+ set(CMAKE_CXX_STANDARD 20)
endif()
if (CMAKE_CXX${CMAKE_CXX_STANDARD}_STANDARD_COMPILE_OPTION) # if standard option was detected by CMake
@@ -108,7 +108,7 @@ option(TBB_DISABLE_HWLOC_AUTOMATIC_SEARCH "Disable HWLOC automatic search by pkg
option(TBB_ENABLE_IPO "Enable Interprocedural Optimization (IPO) during the compilation" ON)
if (NOT DEFINED BUILD_SHARED_LIBS)
- set(BUILD_SHARED_LIBS ON)
+ set(BUILD_SHARED_LIBS OFF)
endif()
if (NOT BUILD_SHARED_LIBS)
diff --git a/include/oneapi/tbb/parallel_for.h b/include/oneapi/tbb/parallel_for.h
index 91c7c44c..fd3f6aee 100644
--- a/include/oneapi/tbb/parallel_for.h
+++ b/include/oneapi/tbb/parallel_for.h
@@ -29,6 +29,7 @@
#include "task_group.h"
#include <cstddef>
+#include <functional>
#include <new>
namespace tbb {
@@ -109,12 +110,12 @@ struct start_for : public task {
// defer creation of the wait node until task allocation succeeds
wait_node wn;
for_task.my_parent = &wn;
- execute_and_wait(for_task, context, wn.m_wait, context);
+ d1::execute_and_wait(for_task, context, wn.m_wait, context);
}
}
//! Run body for range, serves as callback for partitioner
void run_body( Range &r ) {
- tbb::detail::invoke(my_body, r);
+ my_body(r);
}
//! spawn right task, serves as callback for partitioner
diff --git a/include/oneapi/tbb/problems.h b/include/oneapi/tbb/problems.h
new file mode 100644
index 00000000..fa380b53
--- /dev/null
+++ b/include/oneapi/tbb/problems.h
@@ -0,0 +1,37 @@
+#pragma once
+
+#include <atomic>
+
+namespace tbb {
+
+namespace internal {
+
+inline std::atomic<bool>& get_flag_impl() {
+ static std::atomic<bool> flag(false);
+ return flag;
+}
+
+inline std::atomic<unsigned>& get_counter_impl() {
+ static std::atomic<unsigned> counter(0u);
+ return counter;
+}
+
+} // namespace internal
+
+inline void set_flag(bool value=true) {
+ internal::get_flag_impl().store(value, std::memory_order_release);
+}
+
+inline bool get_flag() {
+ return internal::get_flag_impl().load(std::memory_order_acquire);
+}
+
+inline unsigned get_counter() {
+ return internal::get_counter_impl().load(std::memory_order_acquire);
+}
+
+inline void increment_counter() {
+ internal::get_counter_impl().fetch_add(1, std::memory_order_release);
+}
+
+} // namespace tbb
\ No newline at end of file
diff --git a/src/tbb/scheduler_common.h b/src/tbb/scheduler_common.h
index 9e103657..ddf082aa 100644
--- a/src/tbb/scheduler_common.h
+++ b/src/tbb/scheduler_common.h
@@ -254,7 +254,7 @@ public:
// threshold value tuned separately for macOS due to high cost of sched_yield there
, my_yield_threshold{10 * yields_multiplier}
#else
- , my_yield_threshold{100 * yields_multiplier}
+ , my_yield_threshold{10000000 * yields_multiplier}
#endif
, my_pause_count{}
, my_yield_count{}
diff --git a/src/tbb/task_dispatcher.h b/src/tbb/task_dispatcher.h
index f6ff3f17..aa16bf57 100644
--- a/src/tbb/task_dispatcher.h
+++ b/src/tbb/task_dispatcher.h
@@ -30,6 +30,10 @@
#include "itt_notify.h"
#include "concurrent_monitor.h"
+#include "oneapi/tbb/task_group.h"
+#include "oneapi/tbb/parallel_for.h"
+#include "oneapi/tbb/problems.h"
+
#include <atomic>
#if !__TBB_CPU_CTL_ENV_PRESENT
@@ -229,6 +233,16 @@ d1::task* task_dispatcher::receive_or_steal_task(
}
// Nothing to do, pause a little.
waiter.pause(slot);
+
+ if (get_flag()) {
+ tbb::parallel_for(
+ tbb::blocked_range<unsigned>{0u, arena_index + 1u},
+ [] (tbb::blocked_range<unsigned> range) {
+ for (unsigned idx = range.begin(); idx < range.end(); idx++) {
+ increment_counter();
+ }
+ });
+ }
} // end of nonlocal task retrieval loop
__TBB_ASSERT(is_alive(a.my_guard), nullptr); |
What happens internally is I end up calling Is there any assumptions that forbid that kind of behavior and are there any workarounds that respect existing invariants? |
Hi @blonded04 It seems you are trying to do a parallel_for inside the steal loop which is something we never tried. Looking at your gdb it's missing debug information (possibly due to optimizations turned on ). So i would like to ask you to run it on debug mode so that assertions should provide some context on what is going on. But I would like to clarify running parallel_for inside the steal loop is something we never recommend to do. |
Hi @blonded04 , can we close this issue? |
Yes! Thank you very much for help |
While modifying TBB sources, I was trying really hard to understand what is happening in
task_group_context
andtask_group_context_impl
, but I failed miserably.I'm trying to construct
task_group_context
, and then use it to execute a task in multiple separate threads inlocal_wait_for_all
loop viaexecute_and_wait
function. However, no matter what I do it just ends up being something like this (gdb output):I tried:
execute_and_wait
callsAlso I tried constructing
task_group_context
with 2 different argument sets (total 2 * 2 = 4 configurations):PARALLEL_FOR
tbb::task_group_context::bound, tbb::task_group_context::default_traits | tbb::task_group_context::concurrent_wait
Yet I still get the same error :(
Thank you very much for all your previous answers btw.
The text was updated successfully, but these errors were encountered: