Skip to content

Thoughts about engine.md (Guido)

Guido van Rossum edited this page Jan 4, 2024 · 2 revisions

Comments on Tier 2 Execution Engine

This is just personal notes notes -- I am trying something new and using the GitHub Wiki.

Section headings below correspond to the original, and so do some list of bulleted/numbered items.

Overview

How can the copy-and-patch (JIT) compiler be considered an "execution engine"? It's a compiler, so by definition (unlike an interpreter) it doesn't execute anything. Its output is executable though. Maybe we need more terminology so the distinction is clearer? As written this leaves me just confused.

Entry to and exit from superblocks -- is this from Tier 1?

"Performance of jumping [...] is also important. Memory consumption is also important." Yeah, yeah. :-)

Either interpreter or compiler. Not both.

Fine. As clarified off-line, both may exist in the binary, but we decide very early on which one to use, and stick to that one.

I am guessing we might also decide to use neither. Or only one or the other may exist in the binary (e.g. the JIT is not always present).

Superblocks and executors

"An executor is the runtime object that is the executable representation of a superblock." -- how does this relate to execution engines? Is the executor passed to the engine for execution? Or how otherwise is the engine invoked or the executor executed?

Creating executors

"Creating executors from superblocks is the first job of the execution engine." -- This leaves me more confused about the relationship between superblocks, executors, and engines. It seems the mapping between artifacts in the code (like functions) and concepts in this document is a bit fuzzy; I'd like it to be as crisp as possible (with instructions for how to split things up if the code conflates concepts).

Exits from executors

Maybe we could start out by not worrying too much about this (either about the speed of hot exits or the space of cold ones) and iterate after we've got other aspects in place? Or is this of such importance that the entire architecture needs to be aware of this?

Linking executors

Similar -- we can iterate on this. Also, the process is somewhat convoluted, since before we can create the executor we have to have the superblock.

Making progress

Another possible way to avoid this problem would be to make exits where the executor hasn't done any work yet always go to Tier 1. We can statically know (during the translation from superblock to executor) whether an exit hasn't done any work yet: the exit can only be preceded by pure guards (which we can mark in bytecodes.c). On the surface, at least, this is not the same as (1), since it's not about which executors are special, but about which exits; and it's definitely not the same as (2). Reasoning about this seems straightforward.

Inter-superblock optimization

"Some superblocks can be quite short, but form a larger region of hot code" -- this confused me. Maybe "form" should be "combine into"? So several short superblocks combine into a larger region, right? This feels quite advanced right now. Maybe we can punt until we're farther along the project?

Making "hot" exits fast and "cold" exits small

That doesn't sound very helpful -- how do we do this?

The implementation

Since this is just one possible implementation, it still leaves room for a lot of misunderstandings about how various pieces fit together.

Making "hot" exits fast

To be clear, "IP" and "first instruction" here refer to the Tier 2 interpreter's IP, right? So at the C level, or to the hardware CPU, this is still several instructions. And this transfer is responsible for the INCREF/DECREF of the old/new executors.

And in the JIT case, I'd like to understand how a JIT executor and its function pointer relate (presumably the executor is still an object, so I guess it has to contain a function pointer as a piece of data), and who is responsible for the executor object INCREF/DECREF.

Making "cold" exits small

  • The offset/location of the pointer to the exit in the executor, so it can be updated.
    • For the Tier 2 interpreter, this can be an index into an array of pointers, or into the Tier 2 code (if we update in place); either way can be 16 bits
    • For the JIT, it has to be an index into an array of pointers, since we don't want to leave the JITted code writable; can also be 16 bitws
  • The target (offset into the code object of the tier 1 instruction)
    • 16 bits
  • Any relevant known type information (this is optional but will improve optimization)
    • TBD later
  • Any representation changes that have been made.
    • Explain? Is this about the translation from Tier 2 IR to Tier 2 interpreter code?
  • The "hotness" counter
    • 16 bits

Minimizing memory use

TBD later

Each executor gets a table of exit data

Fixed number of exit objects

I don't follow. What is "offset of the exit"? Oh wait, this refers to "The offset/location of the pointer to the exit in the executor"?

I am guessing "exit objects" aren't PyObjects? Because that would be a lot of overhead per uop.

Exit data

Representation changes

Oh, now I get it. This is needed to reconstruct the Tier 1 VM stat when we exit from Tier 2 to Tier 1. (Hm, interesting. When we transfer from one executor to another we won't be doing this, but that means the second executor must assume the VM state is already denormalized. That's going to be interesting if there are multiple ways to enter that second executor.

Hotness counters

(The presentation of this section could be better.)

  1. Store a counter for each exit in the superblock.
    • I guess you mean in the executor.
  2. Have one exit per possible value of the counter, and change the exit object to change the counter.
    • I guess this is the idea you had previously. I think I am being confused though by the meaning of "exit object" and "exit" -- is the exit the instruction in the original executor that exits, or is it some target executor?
  3. Store the counters in a global (per-interpreter) table. LuaJIT does something like this (but with a very small table).
    • I presume "table" is a hash table, with the keys being executor and offset?

I take it you have soured on (2)? (Fine with me.)

EXIT_IF and UNLIKELY_EXIT_IF

Remember that DEOPT_IF is also used in Tier 1, where it does de-optimize (or maybe you can call it "unspecialize"). I don't think that renaming it to EXIT_IF will really help clarity. I can live with UNLIKELY_DEOPT_IF.

I'm curious about the "efficiency reasons" for sharing the exit objects (or is this just that it is indexed by the uop PC?).

After UNLIKELY_EXIT_IF we may still need to reconstruct the Tier 1 VM state right? (It always exits to Tier 1 IIUC?)

Guaranteeing progress

What is progress "when exiting executors"? I guess I'm lost here.

Guaranteeing progress within an executor

If UNLIKELY_EXIT_IF doesn't need to guarantee progress, doesn't that mean that an unlikely exit taken from the very first uop could take us to the ENTER_EXECUTOR in Tier 1 that started this executor, thus causing the infinite looping this requirement is meant to avoid?

Or maybe I'm lost because we've already made progress when entering the executor. I still feel there's something I'm not following here (an infinite loop in the discussion :-).

Exiting to invalid executors

(Where "invalid" is a technical term -- the executor object still exists, it is just out of date. Given the typical meaning of "invalid" (e.g. an invalid pointer could point to freed or reused memory, which is a much stronger sense of invalidity) I wonder if we should change "valid" to "uptodate" (or "not outdated")?

"In the tier 2 interpreter, we change the first instruction to EXIT_TRACE." -- Of course, this means the executor makes no progress. Is that allowed in this case?

The mechanics of transferring execution between executors

+1 (I should remove my question about this above.)

JIT compiler

"pass the old executor as an argument in the tail call" -- this sounds inefficient. How does Brandt solve this in the current JIT? Anyway, in the future, when we have deferred references, we could skip this. (IIRC Sam said "not likely in 3.13" though.)

Interpreter

Yup

Future optimizations

Yeah, let's not worry about this too much. (I think anticipated a bit of the complexity here above under "Representation changes".) (The example "unboxed float" would have helped me understand the concept when it first occurred.)


(I still want to go over these notes to remove stuff that I later cleared up. But that must wait.)