Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-106213: Make Emscripten trampolines work with JSPI #106219

Merged
merged 42 commits into from
Sep 15, 2023

Conversation

hoodmane
Copy link
Contributor

@hoodmane hoodmane commented Jun 29, 2023

There is a WIP proposal to enable webassembly stack switching which have been implemented in v8:

https://github.com/WebAssembly/js-promise-integration

It is not possible to switch stacks that contain JS frames so the Emscripten JS trampolines that allow calling functions with the wrong number of arguments don't work in this case. However, the js-promise-integration proposal requires the type reflection for Wasm/JS API proposal, which allows us to actually count the number of arguments a function expects.

For better compatibility with stack switching, this PR checks if type reflection is available, and if so we use a switch block to decide the appropriate signature. If type reflection is unavailable, we should use the current EMJS trampoline.

We cache the function argument counts since when I didn't cache them performance was negatively affected.

A backport of this patch to v3.11.x is tested against Pyodide and passes our tests:
pyodide/pyodide#3964

There is a WIP proposal to enable webassembly stack switching which have been
implemented in v8:

https://github.com/WebAssembly/js-promise-integration

It is not possible to switch stacks that contain JS frames so the Emscripten JS
trampolines that allow calling functions with the wrong number of arguments
don't work in this case. However, the js-promise-integration proposal requires
the [type reflection for Wasm/JS API](https://github.com/WebAssembly/js-types)
proposal, which allows us to actually count the number of arguments a function
expects.

For better compatibility with stack switching, this PR checks if type reflection
is available, and if so we use a switch block to decide the appropriate
signature. If type reflection is unavailable, we should use the current EMJS
trampoline.

We cache the function argument counts since when I didn't cache them performance
was negatively affected.
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build changes LGTM; I cannot comment on the Emscripten peculiarities, though :)

@hoodmane
Copy link
Contributor Author

I guess I will add a news entry.

@hoodmane
Copy link
Contributor Author

Thanks for the review @erlend-aasland! Emscripten is indeed peculiar...

@erlend-aasland
Copy link
Contributor

Thanks for the review @erlend-aasland! Emscripten is indeed peculiar...

If you have the time to guide me, I'll be happy to review the rest of the PR, just for the fun of learning! :)

@hoodmane
Copy link
Contributor Author

Sure! Thank you for your time. I'll write a more detailed explanation.

Copy link
Contributor Author

@hoodmane hoodmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the background of these trampolines and why this PR.

The purpose of these trampolines

Each WebAssembly function has a signature known to the runtime: how many arguments of what types and how many return values of what types. For security reasons, WebAssembly wants to ensure that every function is always invoked with the expected signature. It has two different instructions for calling a function: a call instruction for calling a function whose signature is validated at load time, and a call_indirect instruction for calling a function pointer whose signature is validated at runtime. So call_indirect is a bit more expensive. If runtime validation fails, then there is a trap.

Section 6.3.2.3, paragraph 8 of the C spec says:

A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.

But in every architecture + ABI other than wasm, it just works: if you cast a function that takes 2 arguments to a function that takes 3 and then call it with 3 arguments, everything goes fine. In wasm, it traps.

People frequently write getters and setters that don't take a void* closure argument and METH_NOARGS functions that don't take the always NULL second argument. So we need some way to fix it.

How the JS trampoline works

Point is simply that calling a JS function is relaxed about arguments. If you give a JS function extra arguments, they are just ignored. So we call wasm --> Js --> Wasm. The Wasm --> Js call is strict about wanting 4 arguments (the function pointer and the original three arguments) and the Js --> Wasm call drops any extra arguments.

What is JSPI (Wasm/JavaScript Promise integration)

The point is that most system calls in Posix are synchronous, whereas most browser APIs are asynchronous. So for instance, people want to use input() to get input from the user but the only ways of getting input are async. JSPI allows switching the stack, waiting for user input, then unswitching the stack and continuing. So that stdin can work correctly again. The JSPI is an implementation-stage proposal which allows a very specific sort of stack switching. There is a more general wasm stack switching API which is still under discussion.

What is the problem

JSPI can only unwind wasm stack frames not JS stack frames. This is because there was concern that the interaction between asyncio unwinding and a second stack switching mechanism could cause subtle bugs in the JS runtime state.

Python/emscripten_trampoline.c Outdated Show resolved Hide resolved
Python/emscripten_trampoline.c Outdated Show resolved Hide resolved
Python/emscripten_trampoline.c Outdated Show resolved Hide resolved
Python/emscripten_trampoline.c Outdated Show resolved Hide resolved
@erlend-aasland
Copy link
Contributor

@erlend-aasland
Copy link
Contributor

Thanks for chiming in, @kumaraditya303!

@hoodmane
Copy link
Contributor Author

hoodmane commented Jul 7, 2023

Okay I think I made the changes suggested by @kumaraditya303 and @erlend-aasland.

Could someone trigger the Emscripten build bot on this PR? If it passes then I think it should be ready to merge (unless of course anyone has further feedback).

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @AlexWaygood for commit 6358371 🤖

The command will test the builders whose names match following regular expression: wasm32-emscripten *

The builders matched are:

  • wasm32-emscripten node (dynamic linking) PR
  • wasm32-emscripten node (pthreads) PR
  • wasm32-emscripten browser (dynamic linking, no tests) PR

@vstinner
Copy link
Member

I'm sorry, but I broke wasm32 with my libregrtest refactoring work. I modified the code to spawn test worker processes in a different working directory: the temporary directory created by the main test process.

@vstinner
Copy link
Member

Wasm32 Python can only be run in the Python source code directory?

@brettcannon
Copy link
Member

brettcannon commented Sep 11, 2023

Wasm32 Python can only be run in the Python source code directory?

Currently the build looks for everything in /lib/ for the stdlib (plus whatever you have to tweak to make sysconfig work). You can run it from other places, but you have to map the the code back to /lib/ inside of the WASI runtime. My guess is the binary needs to be run without any relative paths when running wasmtime.

See

export HOSTRUNNER="wasmtime run --mapdir /::$(pwd) --env PYTHONPATH=/builddir/wasi/build/lib.wasi-wasm32-$PYTHON_VERSION $(pwd)/builddir/wasi/python.wasm --"
for a succinct way to run the build (https://github.com/python/cpython/tree/main/Tools/wasm#running documents this).

Long-term I want to freeze the stdlib into the binary so this sort of thing isn't an issue.

@vstinner
Copy link
Member

I'm sorry, but I broke wasm32 with my libregrtest refactoring work. I modified the code to spawn test worker processes in a different working directory: the temporary directory created by the main test process.

I wrote PR #109290 to fix Emscripten and WASI buildbot workers.

@vstinner
Copy link
Member

I merged a first regrtest fix for wasm/wasi: #109313 (comment)

Sadly, I didn't notice but my other regrtest json file descriptor change also broke wasm/wasi! Apparently, passing a file descriptor with pass_fds=[json_fd] doesn't work on these platforms, whereas using a file descriptor as stdout stdout=output_fd is fine.

@vstinner
Copy link
Member

Sadly, I didn't notice but my other regrtest json file descriptor change also broke wasm/wasi! Apparently, passing a file descriptor with pass_fds=[json_fd] doesn't work on these platforms, whereas using a file descriptor as stdout stdout=output_fd is fine.

I wrote PR #109326 to fix it: use a filename, instead of a file descriptor, on WASM/WASI.

@vstinner
Copy link
Member

Emscripten and WASI buildbot workers are back to green. Sorry for having broken them, I was in the middle of a large refactoring of the regrtest code ;-)

@brettcannon
Copy link
Member

!buildbot wasm32-emscripten *

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @brettcannon for commit 6781a48 🤖

The command will test the builders whose names match following regular expression: wasm32-emscripten *

The builders matched are:

  • wasm32-emscripten node (dynamic linking) PR
  • wasm32-emscripten browser (dynamic linking, no tests) PR
  • wasm32-emscripten node (pthreads) PR

@brettcannon brettcannon merged commit 6b179ad into python:main Sep 15, 2023
22 of 23 checks passed
@brettcannon
Copy link
Member

Thanks for the help, everyone!

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x SLES 3.x has failed when building commit 6b179ad.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/540/builds/6553) and take a look at the build logs.
  4. Check if the failure is related to this commit (6b179ad) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/540/builds/6553

Failed tests:

  • test.test_asyncio.test_subprocess

Failed subtests:

  • test_subprocess_consistent_callbacks - test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_subprocess_consistent_callbacks

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/test/test_asyncio/test_subprocess.py", line 788, in test_subprocess_consistent_callbacks
    self.loop.run_until_complete(main())
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-sles-z/build/Lib/test/test_asyncio/test_subprocess.py", line 780, in main
    self.assertEqual(events, [
AssertionError: Lists differ: [('pi[29 chars]t'), 'pipe_connection_lost', ('pipe_data_recei[57 chars]ted'] != [('pi[29 chars]t'), ('pipe_data_received', 2, b'stderr'), 'pi[57 chars]ted']

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 RHEL8 LTO 3.x has failed when building commit 6b179ad.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/64/builds/5071) and take a look at the build logs.
  4. Check if the failure is related to this commit (6b179ad) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/64/builds/5071

Failed tests:

  • test_glob

Failed subtests:

  • test_selflink - test.test_glob.SymlinkLoopGlobTests.test_selflink

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto/build/Lib/test/test_glob.py", line 386, in test_selflink
    self.assertIn(path, results)
AssertionError: 'dir/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/file' not found in {'dir/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/link/file'}

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 RHEL8 3.x has failed when building commit 6b179ad.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/185/builds/5022) and take a look at the build logs.
  4. Check if the failure is related to this commit (6b179ad) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/185/builds/5022

Failed tests:

  • test_eintr

Failed subtests:

  • test_all - test.test_eintr.EINTRTests.test_all
  • test_lockf - main.FNTLEINTRTest.test_lockf

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64/build/Lib/test/_test_eintr.py", line 523, in test_lockf
    self._lock(fcntl.lockf, "lockf")
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64/build/Lib/test/_test_eintr.py", line 515, in _lock
    self.assertGreaterEqual(dt, self.sleep_time)
AssertionError: 0.14576421538367867 not greater than or equal to 0.2


Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64/build/Lib/test/test_eintr.py", line 17, in test_all
    script_helper.run_test_script(script)
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64/build/Lib/test/support/script_helper.py", line 300, in run_test_script
    raise AssertionError(f"{name} failed")
AssertionError: script _test_eintr.py failed

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 RHEL8 3.x has failed when building commit 6b179ad.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/529/builds/4948) and take a look at the build logs.
  4. Check if the failure is related to this commit (6b179ad) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/529/builds/4948

Failed tests:

  • test.test_multiprocessing_fork.test_manager

Failed subtests:

  • test_rapid_restart - test.test_multiprocessing_fork.test_manager.WithManagerTestManagerRestart.test_rapid_restart

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64/build/Lib/test/_test_multiprocessing.py", line 3140, in test_rapid_restart
    manager.start()
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64/build/Lib/multiprocessing/managers.py", line 569, in start
    self._address = reader.recv()
                    ^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64/build/Lib/multiprocessing/connection.py", line 251, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64/build/Lib/multiprocessing/connection.py", line 431, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-aarch64/build/Lib/multiprocessing/connection.py", line 400, in _recv
    raise EOFError
EOFError

@hoodmane
Copy link
Contributor Author

Thanks so much @brettcannon @erlend-aasland @vstinner for all your help with this!

@hoodmane hoodmane deleted the jspi-trampoline branch September 15, 2023 22:37
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x RHEL7 LTO 3.x has failed when building commit 6b179ad.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/402/builds/5405) and take a look at the build logs.
  4. Check if the failure is related to this commit (6b179ad) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/402/builds/5405

Failed tests:

  • test.test_asyncio.test_subprocess

Failed subtests:

  • test_subprocess_consistent_callbacks - test.test_asyncio.test_subprocess.SubprocessThreadedWatcherTests.test_subprocess_consistent_callbacks

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto/build/Lib/test/test_asyncio/test_subprocess.py", line 788, in test_subprocess_consistent_callbacks
    self.loop.run_until_complete(main())
  File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto/build/Lib/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/dje/cpython-buildarea/3.x.edelsohn-rhel-z.lto/build/Lib/test/test_asyncio/test_subprocess.py", line 780, in main
    self.assertEqual(events, [
AssertionError: Lists differ: [('pi[29 chars]t'), 'pipe_connection_lost', ('pipe_data_recei[57 chars]ted'] != [('pi[29 chars]t'), ('pipe_data_received', 2, b'stderr'), 'pi[57 chars]ted']

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 RHEL8 LTO + PGO 3.x has failed when building commit 6b179ad.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/568/builds/4843) and take a look at the build logs.
  4. Check if the failure is related to this commit (6b179ad) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/568/builds/4843

Failed tests:

  • test.test_asyncio.test_unix_events

Failed subtests:

  • test_fork_signal_handling - test.test_asyncio.test_unix_events.TestFork.test_fork_signal_handling

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto-pgo/build/Lib/unittest/async_case.py", line 90, in _callTestMethod
    if self._callMaybeAsync(method) is not None:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto-pgo/build/Lib/unittest/async_case.py", line 117, in _callMaybeAsync
    return self._asyncioTestContext.run(func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto-pgo/build/Lib/test/support/hashlib_helper.py", line 49, in wrapper
    return func_or_class(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL8-x86_64.lto-pgo/build/Lib/test/test_asyncio/test_unix_events.py", line 1937, in test_fork_signal_handling
    self.assertTrue(child_handled.is_set())
AssertionError: False is not true

csm10495 pushed a commit to csm10495/cpython that referenced this pull request Sep 28, 2023
…-106219)

There is a WIP proposal to enable webassembly stack switching which have been
implemented in v8:

https://github.com/WebAssembly/js-promise-integration

It is not possible to switch stacks that contain JS frames so the Emscripten JS
trampolines that allow calling functions with the wrong number of arguments
don't work in this case. However, the js-promise-integration proposal requires
the [type reflection for Wasm/JS API](https://github.com/WebAssembly/js-types)
proposal, which allows us to actually count the number of arguments a function
expects.

For better compatibility with stack switching, this PR checks if type reflection
is available, and if so we use a switch block to decide the appropriate
signature. If type reflection is unavailable, we should use the current EMJS
trampoline.

We cache the function argument counts since when I didn't cache them performance
was negatively affected.

Co-authored-by: T. Wouters <[email protected]>
Co-authored-by: Brett Cannon <[email protected]>
hoodmane added a commit to hoodmane/cpython that referenced this pull request Jul 13, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 14, 2024
encukou pushed a commit that referenced this pull request Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants