Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tsan test bool #1940

Open
wants to merge 23 commits into
base: latest
Choose a base branch
from
Open

Tsan test bool #1940

wants to merge 23 commits into from

Conversation

galabovaa
Copy link
Contributor

@galabovaa galabovaa commented Sep 20, 2024

Address hanging issue. Fix by @mathgeekcoder with minor modifications

Add sanitizer tests: WIP

Add workaround for cmake regex error

mathgeekcoder and others added 23 commits September 12, 2024 17:40
Removes synchronization with worker threads on shutdown. Also removes the "search" for the main executor in the worker threads.

Instead we simply pass the main executor to the thread as a parameter.  We also pass the underlying shared_ptr to avoid potential edge cases where reference count drops to zero before some threads initialize.

I made the run_worker static to avoid any confusion about "this" vs "executor->ptr", and so it uses the shared_ptr to reference the shared memory.

The last worker thread will delete the shared memory, via the shared_ptr reference count.
…ensure shared memory is kept alive until all threads have finished.

Changed identification of main thread to use std::thread::id rather than thread_local memory pointer.

Added vector of workerThreads which can be detached or joined on shutdown.

Ensured that shutdown can only block if called on main thread, otherwise it might be possible to deadlock.

Manually using cache_aligned memory allocation, it was used previously with shared_ptr and I wanted to keep it just in case.

Note: I had some weird issues when compiling with /MD flag with mvsc.  It would run but often crash.  /MT flag works consistently for me.
Copy link

@mathgeekcoder mathgeekcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I've made some suggestions for changes, they're important but fairly small.

cache_aligned::shared_ptr<HighsTaskExecutor> ptr{nullptr};
HighsTaskExecutor* ptr{nullptr};
bool isMain{false};
bool isStopped{false};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move isStopped into HighsTaskExecutor and wrap it in an std::atomic<bool>.


// check if main thread has shutdown before thread has started
// if (ptr->mainWorkerId.load() != std::thread::id()) {
if (!(executorHandle.isMain)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a check to the shared isStopped, i.e., have we stopped before the thread has started. The isMain check is not needed.

I was using the mainWorkerId == thread::id to implicitly check if we had stopped - the explicit isStopped is much better!

}

threadLocalExecutorHandle().dispose();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the executorHandle above, it might be nicer to use it instead, i.e., executorHandle.dispose();

// auto id = mainWorkerId.exchange(std::thread::id());
// if (id == std::thread::id()) return; // already been called
auto& executorHandle = threadLocalExecutorHandle();
if (executorHandle.isStopped) return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to "safely" only call stop once. i.e.,

if (executorHandle.ptr == nullptr || executorHandle.ptr->isStopped.exchange(true)) 
   return;

This will set isStopped to true and return the previous result. If it had already been set to true we will exit, otherwise we'll proceed to shut things down. This should be guaranteed to avoid race conditions and only shut down once.

// only block if called on main thread, otherwise deadlock may occur
// if (blocking && std::this_thread::get_id() == id) {
// threadLocalExecutorHandle();
if (blocking && executorHandle.isStopped) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check isMain instead of isStopped. If we are within a worker thread the call to .join() doesn't make sense, i.e., we are joining on ourselves?

According to docs it would throw resource_deadlock_would_occur if workerThread.get_id() == std::this_thread::get_id() (deadlock detected).

}
}

if (executorHandle.isMain) executorHandle.isStopped = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is no longer needed if the other isStopped changes are made.

if (executorHandle.ptr == nullptr) {
executorHandle.ptr = new (cache_aligned::alloc(sizeof(HighsTaskExecutor)))
HighsTaskExecutor(numThreads);
executorHandle.isMain = true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely fine, but I'd feel more comfortable if isMain = true was set before ptr was constructed.

void HighsTaskExecutor::ExecutorHandle::dispose() {
if (ptr != nullptr) {
// check to see if we are the main worker and if so, shut down the executor
// if (std::this_thread::get_id() == ptr->mainWorkerId.load()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove my commented code (here and elsewhere)

@@ -52,9 +55,11 @@ class HighsTaskExecutor {
}
#endif

std::vector<cache_aligned::unique_ptr<HighsSplitDeque>> workerDeques;
std::atomic<int> referenceCount;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert: std::atomic<bool> isStopped{false}; or rename it to hasStopped?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants