Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support weval-based ahead-of-time compilation of JavaScript. #91

Merged
merged 6 commits into from
Jul 31, 2024

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jul 27, 2024

When the WEVAL option is turned on (cmake -DWEVAL=ON), this PR adds:

  • Integration to the CMake machinery to fetch a PBL+weval-ified version of SpiderMonkey artifacts;
  • Likewise, to fetch weval binaries;
  • A rule to pre-build a compilation cache of IC bodies specific to the StarlingMonkey build, so weval can use this cache and spend time only on user-provided code on first run;
  • Integration in componentize.sh.

When built with:

$ mkdir build/; cd build/
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_WASM_OPT=OFF -DWEVAL=ON
$ make

We can then do:

$ build/componentize.sh file.js --aot -o file.wasm
$ wasmtime serve -S cli=y file.wasm

Using the Richards Octane benchmark adapted slightly with a main() for the HTTP server world 1, I get the following results:

%  build/componentize.sh
richards.js --aot -o weval.wasm
Componentizing richards.js into weval.wasm
[ verbose weval progress output ]

% wasmtime serve -S cli=y weval.wasm
Serving HTTP on http://0.0.0.0:8080/
stdout [0] :: Log: Richards: 676
stdout [0] :: Log: ----
stdout [0] :: Log: Score (version 9): 676

% wasmtime serve -S cli=y base.wasm
Serving HTTP on http://0.0.0.0:8080/
stdout [0] :: Log: Richards: 189
stdout [0] :: Log: ----
stdout [0] :: Log: Score (version 9): 189

@cfallin cfallin force-pushed the cfallin/weval branch 8 times, most recently from 32cd5eb to 57c9b75 Compare July 29, 2024 22:52
@cfallin cfallin changed the title WIP: weval support. Support weval-based ahead-of-time compilation of JavaScript. Jul 29, 2024
When the `WEVAL` option is turned on (`cmake -DWEVAL=ON`), this PR adds:

- Integration to the CMake machinery to fetch a PBL+weval-ified version
  of SpiderMonkey artifacts;
- Likewise, to fetch weval binaries;
- A rule to pre-build a compilation cache of IC bodies specific to the
  StarlingMonkey build, so weval can use this cache and spend time only
  on user-provided code on first run;
- Integration in `componentize.sh`.

When built with:

```
$ mkdir build/; cd build/
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_WASM_OPT=OFF -DWEVAL=ON
$ make
```

We can then do:

```
$ build/componentize.sh file.js --aot -o file.wasm
$ wasmtime serve -S cli=y file.wasm
```

Using the Richards Octane benchmark adapted slightly with a `main()` for
the HTTP server world [1], I get the following results:

```
%  build/componentize.sh
richards.js --aot -o weval.wasm
Componentizing richards.js into weval.wasm
[ verbose weval progress output ]

% wasmtime serve -S cli=y weval.wasm
Serving HTTP on http://0.0.0.0:8080/
stdout [0] :: Log: Richards: 676
stdout [0] :: Log: ----
stdout [0] :: Log: Score (version 9): 676

% wasmtime serve -S cli=y base.wasm
Serving HTTP on http://0.0.0.0:8080/
stdout [0] :: Log: Richards: 189
stdout [0] :: Log: ----
stdout [0] :: Log: Score (version 9): 189
```

[1]: https://gist.github.com/cfallin/4b18da12413e93f7e88568a92d09e4b7
@cfallin cfallin marked this pull request as ready for review July 29, 2024 22:53
@cfallin
Copy link
Member Author

cfallin commented Jul 29, 2024

I believe this is now ready for review, except possibly for CI testing integration: I haven't yet added a job to run all tests with wevaling enabled. Happy to hear early feedback though (e.g.: should we require a separate --aot flag to componentize.sh?).

@guybedford
Copy link
Contributor

@cfallin I've pushed two new commits here - one to enable testing the new weval option, and one to add a new CI run that tests against it. I'm getting 7/10 passes here with the failing tests as:

Integration test timers?setInterval-handler-parameter-not-callable

Which seems to be related to catching an error.

And two e2e tests when there is invalid JS syntax / a top-level error:

          3 - syntax-err (Failed)
          4 - tla-err (Failed)

where the output gets stuck on (per tests/e2e/tla-err/stderr.log):

Reading raw module bytes...

but then it doesn't actually print the expected error:

Exception while pre-initializing: (new Error("blah", "tla-err.js", 3))

We definitely need to improve the above error printing in general, we should at least ensure some error printing here.

We can also diverge the error printing between Wizer and Weval no problem, and create a separate expectation for Weval, happy to help with that too.

@cfallin
Copy link
Member Author

cfallin commented Jul 29, 2024

Thanks @guybedford! At least the expected-output failure above is due to weval's progress-output verbosity; I can silence that by default. I'll look into the setInterval failure as well.

This commit updates to weval v0.2.7 which has no output by default,
allowing the expected-failure tests checking syntax error messages to
pass; we now pass 9/10 integration tests.

It also updates the flags on `componentize.sh`: a `--verbose` flag to
allow the user to see weval progress messages (perhaps useful if the
build is taking a long time, or to see the stats on function
specializations); and removal of the `--aot` flag, because there is only
one valid setting for a build configuration and it is not baked into the
shell script automatically.
cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 31, 2024
Two issues discovered while debugging the testsuite in StarlingMonkey:

- bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval
  when LLVM applies if-conversion in a new place; use of weval intrinsic
  for value-specialization / subcontext splitting should make this more
  robust.

- bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in
  stack frames up the stack during unwind because of too-aggressive
  optimization trick in weval'd bodies.

With these two changes, this version of SpiderMonkey allows the
StarlingMonkey integration test suite to pass in
bytecodealliance/StarlingMonkey#91.
@cfallin
Copy link
Member Author

cfallin commented Jul 31, 2024

I've got all integration tests passing locally with the changes in bytecodealliance/spidermonkey-wasi-embedding#19 (and a weval update pushed here); there are three review-merge-rebase cycles required to get this PR up to date (bytecodealliance/gecko-dev#54, bytecodealliance/gecko-dev#55, bytecodealliance/spidermonkey-wasi-embedding#19, then update commit hash here) but once all that's done this should be green too.

cfallin added a commit to cfallin/spidermonkey-wasi-embedding that referenced this pull request Jul 31, 2024
Two issues discovered while debugging the testsuite in StarlingMonkey:

- bytecodealliance/gecko-dev#54: fixes handling of conditionals by weval
  when LLVM applies if-conversion in a new place; use of weval intrinsic
  for value-specialization / subcontext splitting should make this more
  robust.

- bytecodealliance/gecko-dev#55: fixes missing interpreter-PC values in
  stack frames up the stack during unwind because of too-aggressive
  optimization trick in weval'd bodies.

With these two changes, this version of SpiderMonkey allows the
StarlingMonkey integration test suite to pass in
bytecodealliance/StarlingMonkey#91.
@cfallin
Copy link
Member Author

cfallin commented Jul 31, 2024

@guybedford tests are green now, I think ready for final review?

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfallin previously when I tried to build the web platform tests test case I was getting this error:

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at src/eval.rs:1362:25:
PC is a runtime value: Runtime(Some(v12146))

now when I try to build the web platform tests I get a silent failure without any stderr.

Was this issue resolved, or did the work to remove the logging also disable legitimate stderr logging?

@cfallin
Copy link
Member Author

cfallin commented Jul 31, 2024

@guybedford the error was resolved by the fix in bytecodealliance/gecko-dev#54; none of my changes should have disabled legitimated stderr output; I can take a look at WPT, I hadn't tried running those tests yet.

@guybedford
Copy link
Contributor

Thanks for confirming the panic was also resolved. In that case, it must be something else unrelated.

Of course, it would be great to get the WPT passing as well, but since we've established no regressions let's go ahead and land for now.

@guybedford guybedford merged commit f7e2745 into bytecodealliance:main Jul 31, 2024
3 checks passed
@cfallin cfallin deleted the cfallin/weval branch July 31, 2024 23:36
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
cfallin added a commit to cfallin/js-compute-runtime that referenced this pull request Aug 1, 2024
This PR pulls in my work to use "weval", the WebAssembly partial
evaluator, to perform ahead-of-time compilation of JavaScript using the
PBL interpreter we previously contributed to SpiderMonkey. This work has
been merged into the BA fork of SpiderMonkey in
bytecodealliance/gecko-dev#45,  bytecodealliance/gecko-dev#46,
bytecodealliance/gecko-dev#47, bytecodealliance/gecko-dev#48,
bytecodealliance/gecko-dev#51, bytecodealliance/gecko-dev#52,
bytecodealliance/gecko-dev#53, bytecodealliance/gecko-dev#54,
bytecodealliance/gecko-dev#55, and then integrated into StarlingMonkey
in bytecodealliance/StarlingMonkey#91.

The feature is off by default; it requires a `--enable-experimental-aot`
flag to be passed to `js-compute-runtime-cli.js`. This requires a
separate build of the engine Wasm module to be used when the flag is
passed.

This should still be considered experimental until it is tested more
widely. The PBL+weval combination passes all jit-tests and jstests in
SpiderMonkey, and all integration tests in StarlingMonkey; however, it
has not yet been widely tested in real-world scenarios.

Initial speedups we are seeing on Octane (CPU-intensive JS benchmarks)
are in the 3x-5x range. This is roughly equivalent to the speedup that a
native JS engine's "baseline JIT" compiler tier gets over its
interpreter, and it uses the same basic techniques -- compiling all
polymorphic operations (all basic JS operators) to inline-cache sites
that dispatch to stubs depending on types. Further speedups can be
obtained eventually by inlining stubs from warmed-up IC chains, but that
requires warmup.

Important to note is that this compilation approach is *fully
ahead-of-time*: it requires no profiling or observation or warmup of
user code, and compiles the JS directly to Wasm that does not do any
further codegen/JIT at runtime. Thus, it is suitable for the per-request
isolation model (new Wasm instance for each request, with no shared
state).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants