Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TSAN error logs #2754

Closed
rwgk opened this issue Dec 29, 2020 · 21 comments · Fixed by #4216
Closed

[BUG] TSAN error logs #2754

rwgk opened this issue Dec 29, 2020 · 21 comments · Fixed by #4216

Comments

@rwgk
Copy link
Collaborator

rwgk commented Dec 29, 2020

TSAN logs as requested by @YannickJadoul here:

#2746 (comment)

Using master b7dfe5c
+ #2409
+ #2746 (YannickJadoul@727b3ae)
+ 2 additional minor local patches.

test -c dbg --config=tsan (Google-internal toolchain)

Failing tests:

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 29, 2020

@rwgk
Copy link
Collaborator Author

rwgk commented Dec 29, 2020

@YannickJadoul
Copy link
Collaborator

To link up things:

@YannickJadoul
Copy link
Collaborator

I knew I'd seen the other one as well:

@bstaletic
Copy link
Collaborator

I know this is the TSAN thread, but TSAN seems to be a major PITA to set up. I'm hoping no one would mind me submitting helgrind's analysis.

https://gist.github.com/bstaletic/98441d24b5e0010118051c31036ef2f8

Note: debug symbols for all dependency are necessary to get useful tracebacks.

Observation: test_gil_scoped.py doesn't seem to touch pybind11 as far as threading errors are concerned, but test_iostream.py does.

@YannickJadoul
Copy link
Collaborator

The main issue with Helgrind (I've been playing around with it, a few days ago), is that it doesn't recognize C++'s atomic, etc?

Btw, the main issue (or one of the main issues) seems to lie in this line:

std::cout << "x" << std::flush;

Apparently, std::cout is supposed to be thread-safe though (guaranteed by the standard, thread-safe; not garbled-output safe, but let's not care about that for now) ? However, I don't know if that's still the case for std::cout with the custom streambuf?

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 1, 2021

@henryiii did someone already try out this suggestion in the meantime?
https://github.com/pybind/pybind11/blame/master/include/pybind11/iostream.h#L41

@henryiii
Copy link
Collaborator

henryiii commented Jan 1, 2021

The current implementation is exactly what is recommended. The "fully qualified virtual call" is at best the same, and possibly not really functional in forcing the correct call. Since it didn't fix the problem, I don't think there's any harm in moving to the fully qualified call, but there should not be a benefit.

@henryiii
Copy link
Collaborator

Do you think we could add a mutex here, perhaps?

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 26, 2021

I really don't know, not knowing much about threading, but I'd be happy to try out ideas for you. If you point me to a branch with experimental changes it'll take me only a few minutes to try them out.

@YannickJadoul
Copy link
Collaborator

@henryiii, @rwgk and I discussed this in a private chat, but we couldn't figure out who's job it is to have the mutex: the custom streambuf or the part of the program that's writing to std::cout?

According to https://berthub.eu/articles/posts/iostreams-unexpected/ (quoting the standard), writing to std::cout from different threads can garble up the output, but shouldn't be UB. So that kind of makes me think the synchronization needs to happen inside the custom streambuf. But at the same time, it seems a bit weird that a class that could perfectly be used in a singled-threaded environment needs to always take care of synchronization?

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 26, 2021

Coincidentally I saw we're using the WITH_THREAD macro already.
(I don't know when and where it's defined.)

@henryiii
Copy link
Collaborator

It's defined if Python was compiled with thread support. We could possibly add a mutex only if WITH_THREAD is defined? I tried to setup TSan, but getting it working on Python on macOS or in docker seems to be painful.

@YannickJadoul
Copy link
Collaborator

It's defined if Python was compiled with thread support. We could possibly add a mutex only if WITH_THREAD is defined? I tried to setup TSan, but getting it working on Python on macOS or in docker seems to be painful.

In principle, even if Python doesn't have threads, a user could still start threads in C++-only code? So that feels tricky

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 26, 2021

I tried to setup TSan, but getting it working on Python on macOS or in docker seems to be painful.

I can give you very quick feedback (most of the time), using our internal tools. I know it's not ideal to have me in the loop, but probably most practical to get rid of that annoying flake.

In principle, even if Python doesn't have threads, a user could still start threads in C++-only code? So that feels tricky

I'd say a problem for later. One at a time.

@YannickJadoul
Copy link
Collaborator

In principle, even if Python doesn't have threads, a user could still start threads in C++-only code? So that feels tricky

I'd say a problem for later. One at a time.

This is part of figuring out the fix for our problem though. The actual issue in the tests is easy to fix: put a mutex around std::cout << foo in that multithreaded test. But we need to figure out whose responsibility it is to do that.
And if we do it only in some cases (e.g., WITH_THREAD) that not the correct scope, you're to some degree only making things worse and making future issues less consistent & harder to reproduce ;-)

@henryiii
Copy link
Collaborator

We already have a mutex of sorts with the gil_scoped_aquire? Are we really worried about performance here? Could we add a lock_guard there as well, possibly / probably including the setp, and maybe later improve performance?

I haven't noticed the empty output error in a long while, roughly since the linked PR went in; this seems to have replaced it. :/

@henryiii
Copy link
Collaborator

This is the same bug as in older versions, but instead of printing empty strings, it now hangs up...

@YannickJadoul
Copy link
Collaborator

I believed I tried abusing the GIL as ad-hoc mutex, and that worked

@rwgk
Copy link
Collaborator Author

rwgk commented Jan 27, 2021

@jbms provided this information to me, posting here with his permission:

In general iostream and streambuf objects are not thread-safe:
https://eel.is/c++draft/iostreams.requirements#iostreams.threadsafety

However, there is an exception for the standard streams:
https://eel.is/c++draft/iostream.objects#overview-7

if they are in "synchronized" mode:
https://eel.is/c++draft/ios.members.static

This is a bit of a weird guarantee, but that is what there is...

There doesn't seem to be any explicit language about thread safety in the case that a different streambuf has been associated with std::cout, but given the design of c++ iostreams, I think all the customization happens via virtual methods of streambuf --- ostream itself does not have any virtual methods. Therefore it stands to reason that the synchronization logic happens in the default streambuf associated with std::cout, and if you change that, you also lose the synchronization.

Indeed, looking at libc++, here is the definition of the streambuf backing std::cout:

https://github.com/llvm/llvm-project/blob/90407b16b1d3e38f1360b6a24ceab801ab9cefc1/libcxx/include/__std_stream#L222

It appears that the way it achieves thread safety is by NOT setting a buffer on the streambuf. Since streambuf manipulates its internal buffer without locking, that isn't thread safe. But by not setting a buffer, all writes are immediately sent to the overflow and xsputn virtual methods, which can then handle the write in a thread-safe manner. The sync method of __stdoutbuf looks questionable to me as far as thread safety, since it does access the shared *__st value, but I think that may only be used for wide streams, and the thread-safety guarantee may not apply to those.

The pybind11 output redirect relies on this streambuf type:

class pythonbuf : public std::streambuf {

It does set a buffer, which leads to streambuf manipulating the buffer in a thread-unsafe manner.

I think it might be possible to make that thread safe by eliminating the buffer, similar to __stdoutbuf. I think the C++ standard still wouldn't guarantee thread safety in this case, but in practice it seems likely to work on all implementations --- I guess you could check what MSVC's standard library and libstdc++ do. Eliminating the buffering may hurt performance. Potentially there could be both a thread-safe and thread-unsafe pythonbuf implementation.

@rwgk
Copy link
Collaborator Author

rwgk commented Mar 23, 2021

Coincidentally, while UBSAN testing PR #2917 locally (against the smart_holder branch), with PYBIND11_USE_SMART_HOLDER_AS_DEFAULT defined, test_iostream.py failed with:

libc++abi: terminating with uncaught exception of type std::runtime_error: Could not allocate string object!

NOTE: It is NOT certain that this UBSAN error is meaningful or related to the TSAN error, but I'm attaching the traceback JIC it's giving someone a useful clue.

ubsan.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants