This repository has been archived by the owner on Nov 10, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
make allocators and sanitizers work for processes created with multiprocessing's spawn method in dev mode #2657
Open
yifuwang
wants to merge
1
commit into
facebook:dev
Choose a base branch
from
yifuwang:export-D30802446-to-dev
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This pull request was exported from Phabricator. Differential Revision: D30802446 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D30802446 |
yifuwang
added a commit
to yifuwang/buck
that referenced
this pull request
Sep 14, 2021
…rocessing's spawn method in dev mode (facebook#2657) Summary: Pull Request resolved: facebook#2657 #### Problem Currently, the entrypoint for in-place Python binaries (i.e. built with dev mode) executes the following steps to load system native dependencies (e.g. sanitizers and allocators): - Backup `LD_PRELOAD` set by the caller - Append system native dependencies to `LD_PRELOAD` - Inject a prologue in user code which restores `LD_PRELOAD` set by the caller - `execv` Python interpreter The steps work as intended for single process Python programs. However, when a Python program spawns child processes, the child processes will not load native dependencies, since they simply `execv`'s the vanilla Python interpreter. A few examples why this is problematic: - The ASAN runtime library is a system native dependency. Without loading it, a child process that loads user native dependencies compiled with ASAN will crash during static initialization because it can't find `_asan_init`. - `jemalloc` is also a system native dependency. Many if not most ML use cases "bans" dev mode because of these problems. It is very unfortunate considering the developer efficiency dev mode provides. In addition, a huge amount of unit tests have to run in a more expensive build mode because of these problems. For an earlier discussion, see [this post](https://fb.workplace.com/groups/fbpython/permalink/2897630276944987/). #### Solution Move the system native dependencies loading logic out of the Python binary entrypoint into an interpreter wrapper, and set the interpreter as `sys.executable` in the injected prologue: - The Python binary entrypoint now uses the interpreter wrapper, which has the same command line interface as the Python interpreter, to run the main module. - `multiprocessing`'s `spawn` method now uses the interpreter wrapper to create child processes, ensuring system native dependencies get loaded correctly. #### Alternative Considered One alternative considered is to simply not removing system native dependencies from `LD_PRELOAD`, so they are present in the spawned processes. However, this causes some linking issues, which were perhaps the reason `LD_PRELOAD` was restored in the first place: in-place Python binaries have access to binaries install on devservers that are not built with the target platform (e.g. `/bin/sh` which is used by some Python standard libraries). These binaries does not link properly with the system native dependencies. #### References An old RFC for this change: D16210828 The counterpart for opt mode: D16350169 fbshipit-source-id: 118d3a4657ba397b1c98b95d62f85ad01e234422
yifuwang
force-pushed
the
export-D30802446-to-dev
branch
from
September 14, 2021 22:43
63d7d1b
to
a54cc5f
Compare
…rocessing's spawn method in dev mode (facebook#2657) Summary: Pull Request resolved: facebook#2657 #### Problem Currently, the entrypoint for in-place Python binaries (i.e. built with dev mode) executes the following steps to load system native dependencies (e.g. sanitizers and allocators): - Backup `LD_PRELOAD` set by the caller - Append system native dependencies to `LD_PRELOAD` - Inject a prologue in user code which restores `LD_PRELOAD` set by the caller - `execv` Python interpreter The steps work as intended for single process Python programs. However, when a Python program spawns child processes, the child processes will not load native dependencies, since they simply `execv`'s the vanilla Python interpreter. A few examples why this is problematic: - The ASAN runtime library is a system native dependency. Without loading it, a child process that loads user native dependencies compiled with ASAN will crash during static initialization because it can't find `_asan_init`. - `jemalloc` is also a system native dependency. Many if not most ML use cases "bans" dev mode because of these problems. It is very unfortunate considering the developer efficiency dev mode provides. In addition, a huge amount of unit tests have to run in a more expensive build mode because of these problems. For an earlier discussion, see [this post](https://fb.workplace.com/groups/fbpython/permalink/2897630276944987/). #### Solution Move the system native dependencies loading logic out of the Python binary entrypoint into an interpreter wrapper, and set the interpreter as `sys.executable` in the injected prologue: - The Python binary entrypoint now uses the interpreter wrapper, which has the same command line interface as the Python interpreter, to run the main module. - `multiprocessing`'s `spawn` method now uses the interpreter wrapper to create child processes, ensuring system native dependencies get loaded correctly. #### Alternative Considered One alternative considered is to simply not removing system native dependencies from `LD_PRELOAD`, so they are present in the spawned processes. However, this causes some linking issues, which were perhaps the reason `LD_PRELOAD` was restored in the first place: in-place Python binaries have access to binaries install on devservers that are not built with the target platform (e.g. `/bin/sh` which is used by some Python standard libraries). These binaries does not link properly with the system native dependencies. #### References An old RFC for this change: D16210828 The counterpart for opt mode: D16350169 Reviewed By: fried, bobyangyf, Reubend fbshipit-source-id: 8c13de3517155cf3a8d69a212e30565c5c7277e0
yifuwang
force-pushed
the
export-D30802446-to-dev
branch
from
September 16, 2021 19:38
a54cc5f
to
ba64e24
Compare
This pull request was exported from Phabricator. Differential Revision: D30802446 |
facebook-github-bot
pushed a commit
that referenced
this pull request
Sep 16, 2021
…rocessing's spawn method in dev mode (#2657) Summary: Pull Request resolved: #2657 #### Problem Currently, the entrypoint for in-place Python binaries (i.e. built with dev mode) executes the following steps to load system native dependencies (e.g. sanitizers and allocators): - Backup `LD_PRELOAD` set by the caller - Append system native dependencies to `LD_PRELOAD` - Inject a prologue in user code which restores `LD_PRELOAD` set by the caller - `execv` Python interpreter The steps work as intended for single process Python programs. However, when a Python program spawns child processes, the child processes will not load native dependencies, since they simply `execv`'s the vanilla Python interpreter. A few examples why this is problematic: - The ASAN runtime library is a system native dependency. Without loading it, a child process that loads user native dependencies compiled with ASAN will crash during static initialization because it can't find `_asan_init`. - `jemalloc` is also a system native dependency. Many if not most ML use cases "bans" dev mode because of these problems. It is very unfortunate considering the developer efficiency dev mode provides. In addition, a huge amount of unit tests have to run in a more expensive build mode because of these problems. For an earlier discussion, see [this post](https://fb.workplace.com/groups/fbpython/permalink/2897630276944987/). #### Solution Move the system native dependencies loading logic out of the Python binary entrypoint into an interpreter wrapper, and set the interpreter as `sys.executable` in the injected prologue: - The Python binary entrypoint now uses the interpreter wrapper, which has the same command line interface as the Python interpreter, to run the main module. - `multiprocessing`'s `spawn` method now uses the interpreter wrapper to create child processes, ensuring system native dependencies get loaded correctly. #### Alternative Considered One alternative considered is to simply not removing system native dependencies from `LD_PRELOAD`, so they are present in the spawned processes. However, this causes some linking issues, which were perhaps the reason `LD_PRELOAD` was restored in the first place: in-place Python binaries have access to binaries install on devservers that are not built with the target platform (e.g. `/bin/sh` which is used by some Python standard libraries). These binaries does not link properly with the system native dependencies. #### References An old RFC for this change: D16210828 The counterpart for opt mode: D16350169 Reviewed By: fried, bobyangyf, Reubend fbshipit-source-id: e17696f5c6f31138d9ea7f5e56408097eb282859
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Problem
Currently, the entrypoint for in-place Python binaries (i.e. built with dev
mode) executes the following steps to load system native dependencies (e.g.
sanitizers and allocators):
LD_PRELOAD
set by the callerLD_PRELOAD
LD_PRELOAD
set by the callerexecv
Python interpreterThe steps work as intended for single process Python programs. However, when a
Python program spawns child processes, the child processes will not load native
dependencies, since they simply
execv
's the vanilla Python interpreter. A fewexamples why this is problematic:
child process that loads user native dependencies compiled with ASAN will
crash during static initialization because it can't find
_asan_init
.jemalloc
is also a system native dependency.Many if not most ML use cases "bans" dev mode because of these problems. It is
very unfortunate considering the developer efficiency dev mode provides. In
addition, a huge amount of unit tests have to run in a more expensive build
mode because of these problems.
For an earlier discussion, see this post.
Solution
Move the system native dependencies loading logic out of the Python binary
entrypoint into an interpreter wrapper, and set the interpreter as
sys.executable
in the injected prologue:same command line interface as the Python interpreter, to run the main
module.
multiprocessing
'sspawn
method now uses the interpreter wrapper to createchild processes, ensuring system native dependencies get loaded correctly.
Alternative Considered
One alternative considered is to simply not removing system native dependencies
from
LD_PRELOAD
, so they are present in the spawned processes. However, thiscauses some linking issues, which were perhaps the reason
LD_PRELOAD
wasrestored in the first place: in-place Python binaries have access to binaries
install on devservers that are not built with the target platform (e.g.
/bin/sh
which is used by some Python standard libraries). These binaries doesnot link properly with the system native dependencies.
References
An old RFC for this change: D16210828
The counterpart for opt mode: D16350169