Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGO-ed JIT-ted startup #682

Open
Fidget-Spinner opened this issue May 25, 2024 · 1 comment
Open

PGO-ed JIT-ted startup #682

Fidget-Spinner opened this issue May 25, 2024 · 1 comment

Comments

@Fidget-Spinner
Copy link
Collaborator

Fidget-Spinner commented May 25, 2024

So this came from a discussion I had with @tekknolagi and Brandt at PyCon US. It depends on arbitrary-length superinstructions.

The main idea is that startup runs a lot of Python. There are two orthogonal ways to speed up startup: reduce work done at startup, or speed up Python. Ideally we should do both. In the spirit on whacky ideas, I will suggest a moonshot idea to significantly speed up Python only at startup:

Assuming startup code is mostly static, apart from fetching system locale, codecs, encoding, etc. During build time, we collect the traces formed by the JIT only at startup. We then pass the entire trace as a single stencil to clang to compile (still respecting the tree structure of course). During runtime, every startup will thus find the new "startup superinstructions" and the jitted code will be extremely efficient. There's main reason why this will be significantly faster than turning on the JIT is that the entire trace becomes a single instruction, allowing clang to perform whole-of-trace optimizations.

This is somewhat similar to Stefan Brunthaler's multi level quickening paper where there is sort of "PGO" but for benchmarks. However, since benchmarks are not a reliable example of real-world code, this just limits it to startup.

@brandtbucher
Copy link
Member

Look, I love Futamura projections as much as the next compiler engineer, but... I think that an idea like this probably needs at least a proof-of-concept to proceed much further. Things that jump out to me as potential issues that will need to be tackled early on:

  • Handling the thousands of potential deopt events correctly.
  • Handling internal loops and other control flow in the superinstruction.
  • Staying on trace in the wide range of possible startup paths.
  • How to encode thousands of opargs, operands, etc.

That's not even counting the wrinkles raising and catching exceptions, performing calls through C code to more Python code, etc. Likely it may make more sense to just add some more reasonably-sized-but-maybe-a-little-longer superinstructions that don't require deep surgery on the tier two instruction format itself. This seems quite a bit easier to experiment with and more likely to succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants