Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding iostream.h thread-safety documentation. #2995

Merged
merged 4 commits into from
Jul 12, 2021

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented May 6, 2021

Description

For full background see the discussions under PR #2982 and this PR.

This PR resolves the clang ThreadSanitizer (tsan) error reported under #2754, but only in stop-gap fashion.

#HelpAppreciated: Work on iostream.h thread safety.

Suggested changelog entry:

An important warning about thread safety was added to the iostream.h documentation.

@rwgk rwgk requested a review from henryiii May 6, 2021 21:03
@rwgk
Copy link
Collaborator Author

rwgk commented May 6, 2021

Hi @tttapa, could you please help with a full review of this PR? The added documentation is mostly distilled from your comments on #2982.

@YannickJadoul
Copy link
Collaborator

Based on the discussion under PR #2982, the TestThread code is removed completely. In the very unlikely event that someone wants to contribute a thread-safe iostream.h implementation, it can easily be restored from the git history.

Remove a test because it triggers a failure? Why not just follow the docs you are adding in literally the same PR and add a lock to the test? At least that would illustrate and test the docs you've just written.

@rwgk
Copy link
Collaborator Author

rwgk commented May 6, 2021

Based on the discussion under PR #2982, the TestThread code is removed completely. In the very unlikely event that someone wants to contribute a thread-safe iostream.h implementation, it can easily be restored from the git history.

Remove a test because it triggers a failure? Why not just follow the docs you are adding in literally the same PR and add a lock to the test? At least that would illustrate and test the docs you've just written.

That's what I tried first. Pieter replied with a "defeats the point of the test" comment here: #2982 (comment)

Basically, it would make that part of the unit test just a demo for how to do locking. I think it's better removed.

@henryiii
Copy link
Collaborator

henryiii commented May 7, 2021

Can we just disable the test for Windows? We've never seen it flake on anything but MSVC.

@rwgk
Copy link
Collaborator Author

rwgk commented May 7, 2021

Can we just disable the test for Windows? We've never seen it flake on anything but MSVC.

Thanks Henry, we could, but

  • If I understand @tttapa's explanations under Fix Unicode support for ostream redirects #2982 correctly, it's undefined behavior. We're likely to lead people down bad paths. Chances are it'll not work correctly somewhere sooner or later, and even if it does today, the moment they want to port to Windows they will hit on this for sure.

  • I cannot routinely run TSAN on my end without unusual tricks to skip the test.

Two-and-a-half ideas:

  • -DPYBIND11_TEST_IOSTREAM_H_WITHOUT_LOCKING`, so that it becomes an opt-in that we enable only for certain platforms.
  • I could add back the mutex I experimented with earlier in connection with Fix Unicode support for ostream redirects #2982. But note that TestThread then requires three additional includes (<atomic>, <mutex>, <thread>) that are unrelated to what is being unit-tested. To mitigate that I could split out TestThread into a separate test.

Please let me know what you prefer and I'll change this PR accordingly.

@henryiii
Copy link
Collaborator

henryiii commented May 8, 2021

My preference would be for idea 2 (test with mutexes) - I'm not that worried about stdlib includes in a test file, and it makes it a really nice example (if it works). But other than that, given the two points you listed, the current form of the PR is acceptable if not quite as ideal as that would be.

@rwgk
Copy link
Collaborator Author

rwgk commented May 8, 2021

My preference would be for idea 2 (test with mutexes) - I'm not that worried about stdlib includes in a test file, and it makes it a really nice example (if it works). But other than that, given the two points you listed, the current form of the PR is acceptable if not quite as ideal as that would be.

Thanks Henry, I'll restore TestThread with added mutex and comments to pull in more from Pieter's explanations. (With a little luck I'll get to it very soon.)

@tttapa
Copy link
Contributor

tttapa commented May 8, 2021

The changes to the documentation look good to me. Maybe I would be more specific about this sentence:

Therefore it is a requirement that all (possibly) concurrent redirected ostream writes are locked.

Perhaps “protected using a mutex” is clearer than “locked”?

I agree that it might be a good idea to keep a multithreaded test as an example on how to do the locking correctly. In that case I would try to isolate it from the other iostream tests, i.e. don't add locks to the existing functions noisy_function, raw_output etc., but create new functions with locks for the threaded test.


I've thought about it some more, and locking in user code could prove problematic:

--- Thread A ---                --- Thread B ---

user acq GIL                    user acq cout lock
do some work under GIL ...      user write to or flush cout
user acq cout lock                  pythonbuf::_sync acq GIL, rel GIL
user write to cout              user rel cout lock
user rel cout lock
user rel GIL

The following sequence now causes deadlock:

A acq GIL
B acq cout lock
B acq GIL (fails)
A acq cout lock (fails)

Keeping this in mind, adding the locking to pythonbuf itself no longer looks like such a terrible idea, as it greatly simplifies things for the user ...
Alternatively, you could tell users not to print to redirected ostreams while holding the GIL, but in my opinion that's a far less reasonable requirement than just telling them to protect concurrent writes to the ostream.

@rwgk
Copy link
Collaborator Author

rwgk commented May 11, 2021

Hi All, please take another look.

The TestThread code is back, with an added mutex and a comment to explain that more work is needed. I updated the other two comments accordingly.

When I started this PR, my main interest was to get rid of the test_iostream Windows flakiness and the TSAN error. I don't have a vested interest beyond those two points. Currently, iostream.h is not used at Google. The TSAN error is definitely gone, I ran the test 1000 times. I assume the Windows flakiness is also gone, although we may have to wait and see to be sure.

@rwgk
Copy link
Collaborator Author

rwgk commented Jun 4, 2021

Small update, the Windows flake still happens, although much more rarely after #2982 was merged. Here is one example I just observed: https://github.com/pybind/pybind11/pull/3023/checks?check_run_id=2743130279

@rwgk
Copy link
Collaborator Author

rwgk commented Jun 6, 2021

Here is another one, different Python version (3.8 vs 2.7), different C++ standard (11 vs. 17), but same symptom:

https://github.com/pybind/pybind11/pull/3030/checks?check_run_id=2758466712

@rwgk
Copy link
Collaborator Author

rwgk commented Jun 16, 2021

I just hit on this again, after not seeing these errors for quite a long time:

libc++abi: terminating with uncaught exception of type std::runtime_error: Could not allocate string object!
*** SIGABRT received by PID 9984 (TID 9984) on cpu 4 from PID 9984; ***         
Fatal Python error: Aborted                                                     

This was with ASAN (not TSAN).
Simply running again it's fine.

@henryiii I feel we really need to warn users prominently.

@rwgk rwgk force-pushed the iostream_h_thread_safety_documentation branch from e533a17 to 307f8a8 Compare July 3, 2021 18:05
@rwgk rwgk mentioned this pull request Jul 12, 2021
@rwgk
Copy link
Collaborator Author

rwgk commented Jul 12, 2021

Thanks Henry!

@rwgk rwgk merged commit 7472d37 into pybind:master Jul 12, 2021
@rwgk rwgk deleted the iostream_h_thread_safety_documentation branch July 12, 2021 20:39
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jul 12, 2021
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Jul 12, 2021
…ell check was added after the CI for PR pybind#2995 last ran.
@rwgk rwgk mentioned this pull request Jul 12, 2021
henryiii pushed a commit that referenced this pull request Jul 12, 2021
…ell check was added after the CI for PR #2995 last ran. (#3103)
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Jul 13, 2021
@rwgk rwgk mentioned this pull request Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants