Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul implementation of for-generators #844

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

odenix
Copy link
Contributor

@odenix odenix commented Dec 10, 2024

Based on: #837

Motivation:

  • fix known bugs and limitations of for-generators
  • improve code health by removing complex workarounds

Changes:

  • simplify AstBuilder code related to for-generators
    • track for-generators via SymbolTable.enterForGenerator()
    • add RestoreForBindingsNode during initial AST construction
      instead of calling MemberNode.replaceBody() later on
    • simplify unnecessarily complex code such as objectMemberInserter
  • remove workarounds and band-aids such as:
    • isInIterable
    • executeAndSetEagerly
    • adding dummy slots in AmendFunctionNode
  • overhaul implementation of for-generators
    • store keys and values of for-generator iterations in regular instead of auxiliary frame slots
      • set them via TypeNode.executeAndSet()
      • ResolveVariableNode no longer needs to search auxiliary slots
      • Read(Enclosing)AuxiliarySlot is no longer needed
    • at the start of each for-generator iteration, create a new VirtualFrame
      that is a copy of the current frame (arguments + slots)
      and stores the iteration key and value in additional slots.
    • execute for-generator iteration with the newly created frame
      • childNode.execute(newFrame)
      • Pkl objects created during the iteration will materialize this frame
    • store newly created frames in owner.extraStorage
      if their for-generator slots may be accessed when a generated member is executed
      • resolving variable names to for-generator variables at parse time would make this analysis more precise
    • when a generated member is executed,
      * retrieve the corresponding frame stored in owner.extraStorage
      * copy the retrieved frame's for-generator slots into slots of the current frame

Result:

  • for-generators are implemented in a correct, reasonably simple, and reasonably efficient way
    • complexity is fully contained within package generator and AstBuilder
  • for-generator keys and values can be accessed from all nested scopes:
    • key and value expressions of generated members
    • condition expressions of nested when-generators
    • iterable expressions of nested for-generators
  • for-generator keys and values can be accessed from within objects created by the expressions listed above
  • sibling for-generators can use the same key/value variable names
  • parent/child for-generators can use the same key/value variable names
  • fixes Late-bound values of iteratees within nested for/spread fail to resolve for-generator variables #741

Limitations not addressed in this PR:

  • object spreading is eager in values
    This should be easy to fix.
  • for-generators are eager in values
    I think this could be fixed by:
    • resolving variable names to for-generator variables at parse time
    • replacing every access to a for-generator's value with iterable[key]
  • for/when-generator bodies can't have local properties/methods
    I think this could be fixed by:
    • resolving variable names to local properties/methods at parse time
    • internally renaming generated local properties/methods to avoid name clashes

Motivation:
- Perform same exception handling for every implementation of PklRootNode.execute().
- Avoid code duplication.

Changes:
- Change PklRootNode.execute() to be a final method that performs exception handling
  and calls abstract method executeImpl(), which is implemented by subclasses.
- Remove executeBody() methods, which served a similar purpose but were more limited.
- Remove duplicate exception handling code.

Result:
- More reliable exception handling.
  This should fix known problems such as misclassifying stack overflows
  as internal errors and displaying errors without a stack trace.
- Less code duplication.
Motivation:
* fix known bugs and limitations of for-generators
* improve code health by removing complex workarounds

Changes:
* simplify AstBuilder code related to for-generators
  * track for-generators via `SymbolTable.enterForGenerator()`
  * add `RestoreForBindingsNode` during initial AST construction
    instead of calling `MemberNode.replaceBody()` later on
  * simplify some unnecessarily complex code
* remove workarounds and band-aids such as:
  * `isInIterable`
  * `executeAndSetEagerly`
  * adding dummy slots in `AmendFunctionNode`
* overhaul implementation of for-generators
  * store keys and values of for-generator iterations in regular instead of auxiliary frame slots
    * set them via `TypeNode.executeAndSet()`
    * `ResolveVariableNode` no longer needs to search auxiliary slots
    * `Read(Enclosing)AuxiliarySlot` is no longer needed
  * at the start of each for-generator iteration, create a new `VirtualFrame`
    that is a copy of the current frame (arguments + slots)
    and stores the iteration key and value in additional slots.
  * execute for-generator iteration with the newly created frame
    * `childNode.execute(newFrame)`
    * Pkl objects created during the iteration will materialize this frame
  * store newly created frames in `owner.extraStorage`
    if their for-generator slots may be accessed when a generated member is executed
    * resolving variable names to for-generator variables at parse time would make this analysis more precise
  * when a generated member is executed,
	  * retrieve the corresponding frame stored in `owner.extraStorage`
	  * copy the retrieved frame's for-generator slots into slots of the current frame

Result:
* for-generators are implemented in a correct, reasonably simple, and reasonably efficient way
  * complexity is fully contained within package `generator` and `AstBuilder`
* for-generator keys and values can be accessed from all nested scopes:
  * key and value expressions of generated members
  * condition expressions of nested when-generators
  * iterable expressions of nested for-generators
* for-generator keys and values can be accessed from within objects created by the expressions listed above
* sibling for-generators can use the same key/value variable names
* parent/child for-generators can use the same key/value variable names
* fixes apple#741

Limitations not addressed in this PR:
* object spreading is eager in values
  This should be easy to fix.
* for-generators are eager in values
  I think this could be fixed by:
  * resolving variable names to for-generator variables at parse time
  * replacing every access to a for-generator's `value` with `iterable[key]`
* for/when-generator bodies can't have local properties/methods
  I think this could be fixed by:
  * resolving variable names to local properties/methods at parse time
  * internally renaming generated local properties/methods to avoid name clashes
Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is a good improvement in how for-generators work!

I did a first pass, take a look at my comments.

Also, this is slower than the current implementation. Some quick numbers:

// test.pkl

amends "pkl:Benchmark"

outputBenchmarks {
  ["for-generator"] {
    sourceModule = import("test2.pkl")
  }
}
// test2.pkl
res {
  for (i in IntSeq(1, 10000)) {
    i
  }
}

Running this benchmark produces:

Current Pkl:

outputBenchmarks {
  ["for-generator"] {
    iterations = 15
    repetitions = 50
    min = 2.86.ms
    max = 3.01.ms
    mean = 2.96.ms
    stdev = 0.05.ms
    error = 0.03.ms
  }
}

This PR:

outputBenchmarks {
  ["for-generator"] {
    iterations = 15
    repetitions = 50
    min = 3.5.ms
    max = 3.88.ms
    mean = 3.64.ms
    stdev = 0.1.ms
    error = 0.06.ms
  }
}

Tested on macOS/M1, testing the native executable (built with -DreleaseBuild=true)


private static FrameDescriptor.Builder newFrameDescriptorBuilder(FrameDescriptor descriptor) {
var builder = FrameDescriptor.newBuilder();
for (int i = 0; i < descriptor.getNumberOfSlots(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int i = 0; i < descriptor.getNumberOfSlots(); i++) {
for (var i = 0; i < descriptor.getNumberOfSlots(); i++) {

// Only a subset of members have their frames stored (`GeneratorMemberNode.isFrameStored`).
// Frames are stored in `owner.extraStorage` and retrieved by `RestoreForBindingsNode`
// when members are executed.
private final EconomicMap<Object, MaterializedFrame> generatorFrames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code:

foo {
  for (i in IntSeq(1, 100)) {
    i
  }
}

Results in this generatorFrames map (in Pkl syntax), where there is really only one frame (represented as THE_FRAME):

Map(
  1, THE_FRAME,
  2, THE_FRAME,
  3, THE_FRAME,
  4, THE_FRAME,
  5, THE_FRAME,
  // ...
  100, THE_FRAME
)

I'm not sure if this is the right model; there's many members for just this one frame. It's doing a lot of extra allocation here.

It seems quite extra to use for-generator keys as a way to look up the same materialized frame.

Food for thought: maybe the lookup key can be a synthesized name for the for-generator. It doesn't matter too much what this name is, but it should be simple enough to use that for lookup in RestoreForBindingsNode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry... disregard that comment. I've had too much eggnog (or that's my excuse, anyway).

One issue I do see here, though, is that we are materializing the same frame multiple times in the case of many generator members in the same for body. E.g.

foo {
  for (i in someList) {
    i + 1
    i + 2
    i + 3
  }
}

And, according to the contract of the API, this allocates a new frame. So, let's guard against it; something like this should work:

diff --git a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java
index 03ae904c2..0bce42c8c 100644
--- a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java
+++ b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java
@@ -24,6 +24,7 @@ import org.pkl.core.ast.member.ObjectMember;
 import org.pkl.core.runtime.VmObject;
 import org.pkl.core.runtime.VmUtils;
 import org.pkl.core.util.EconomicMaps;
+import org.pkl.core.util.Nullable;
 
 /** Data collected by {@link GeneratorObjectLiteralNode} to generate a `VmObject`. */
 public final class ObjectData {
@@ -36,12 +37,14 @@ public final class ObjectData {
   private final EconomicMap<Object, MaterializedFrame> generatorFrames;
   // The object's number of elements.
   private int length;
+  private @Nullable MaterializedFrame currentFrame;
 
   ObjectData(int parentLength) {
     // optimize for memory usage by not estimating minimum size
     members = EconomicMaps.create();
     generatorFrames = EconomicMaps.create();
     length = parentLength;
+    currentFrame = null;
   }
 
   UnmodifiableEconomicMap<Object, ObjectMember> members() {
@@ -56,6 +59,10 @@ public final class ObjectData {
     return generatorFrames.isEmpty();
   }
 
+  void resetForBindings() {
+    currentFrame = null;
+  }
+
   void addElement(VirtualFrame frame, ObjectMember member, GeneratorMemberNode node) {
     addMember(frame, (long) length, member, node);
     length += 1;
@@ -70,8 +77,11 @@ public final class ObjectData {
       CompilerDirectives.transferToInterpreter();
       throw node.duplicateDefinition(key, member);
     }
+    if (currentFrame == null) {
+      currentFrame = frame.materialize();
+    }
     if (node.isFrameStored) {
-      EconomicMaps.put(generatorFrames, key, frame.materialize());
+      EconomicMaps.put(generatorFrames, key, currentFrame);
     }
   }
diff --git a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java
index 865b6c632..b01afbe94 100644
--- a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java
+++ b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java
@@ -68,7 +68,7 @@ public abstract class GeneratorForNode extends GeneratorMemberNode {
 
   @Override
   public final void execute(VirtualFrame frame, Object parent, ObjectData data) {
-    initialize(frame);
+    initialize(frame, data);
     executeWithIterable(frame, parent, data, iterableNode.executeGeneric(frame));
   }
@@ -164,7 +164,8 @@ public abstract class GeneratorForNode extends GeneratorMemberNode {
     }
   }
 
-  private void initialize(VirtualFrame frame) {
+  private void initialize(VirtualFrame frame, ObjectData data) {
+    data.resetForBindings();
     if (unresolvedKeyTypeNode != null) {
       CompilerDirectives.transferToInterpreterAndInvalidate();
       var keySlot = frame.getFrameDescriptor().getNumberOfSlots();

if (EconomicMaps.put(data.members, key, member) != null) {
CompilerDirectives.transferToInterpreter();
throw duplicateDefinition(key, member);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind getting rid of this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not removed, just moved from node classes to class ObjectData (see method addMember).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that you called this out in your PR description. So, effectively this relaxes our current rule of: you cannot re-use the same for-generator name in a nested for-generator.

This code, which is currently invalid, becomes valid:

obj {
  for (bar in something) {
    for (bar in somethingElse) ... }
  }
}

I don't see any issues with this, and lines up with how other languages work (for loops can create nested scopes that shadow outer variables).

CC @stackoverflow @holzensp for comments.

BTW: re: this comment:

sibling for-generators can use the same key/value variable names

This is possible today, too.

var convertedKey = member.isProp() ? key.toString() : key;
// TODO: Executing iteration behind a Truffle boundary is bad for performance.
// This and similar cases will be fixed in an upcoming PR that replaces method
// `(forceAnd)iterateMemberValues` with cursor-based external iterators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely interested in a future PR of yours that addresses this.

But, we currently have a (small) optimization that this PR removes, in evaluateMembers. I think we should bring that back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several places in the codebase that call Node.execute from behind a Truffle boundary. The PR that fixes all of them has been ready from my side for over a month, but due to the slow review progress, I haven't sent it yet. Please let's not bring back this workaround; it's not worth it and will only slow us down further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; don't need to block here!

@odenix
Copy link
Contributor Author

odenix commented Dec 20, 2024

Also, this is slower than the current implementation

Correctness has a performance cost here. However, we can probably also find benchmarks where the new implementation performs better than the old one. For example, ResolveVariableNode now has less work to do, and iterables are no longer forced. Also, there is probably room to improve performance, especially if we have some representative benchmarks.

I'm not sure if this is the right model; there's many members for just this one frame. It's doing a lot of extra allocation here.

This instantiates 100 frames. Each frame captures one iteration's key/value binding. I'm pretty confident this is the right model (even discussed it in Truffle Slack). It's a more efficient form of having a root node for the loop body and calling it 100 times, which would also result in 100 frames. The old implementation did a similar allocation for every iteration: https://github.com/apple/pkl/blob/main/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorMemberNode.java#L115-L117

Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: performance regression:

I tested these changes against some real-world code, and this is what time has to say about it:

Pkl 0.27:

________________________________________________________
Executed in  299.86 secs    fish           external
   usr time   29.09 mins    0.11 millis   29.09 mins
   sys time    1.01 mins    2.81 millis    1.01 mins

These changes:

________________________________________________________
Executed in  297.81 secs    fish           external
   usr time   28.74 mins   53.00 micros   28.74 mins
   sys time    1.00 mins  889.00 micros    1.00 mins

So, all in all, the changes introduced here are actually slightly faster in my one run (although not significant enough to rule out noise), so, my performance concerns really aren't that high.

I made another pass. I still need to get through more of this code, but want to provide some more feedback rather than delay too much more.

// when members are executed.
private final EconomicMap<Object, MaterializedFrame> generatorFrames;
// The object's number of elements.
private int length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] turn all of these into JavaDoc comments

Copy link
Contributor Author

@odenix odenix Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you prefer Javadoc for internal code? (Using Javadoc for internal code will become less painful with Java 23 Markdown doc comments.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if these don't get turned into actual javadoc, it's nice because the IDE provides insight here (you get hover-over docs, for example)

// Only a subset of members have their frames stored (`GeneratorMemberNode.isFrameStored`).
// Frames are stored in `owner.extraStorage` and retrieved by `RestoreForBindingsNode`
// when members are executed.
private final EconomicMap<Object, MaterializedFrame> generatorFrames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry... disregard that comment. I've had too much eggnog (or that's my excuse, anyway).

One issue I do see here, though, is that we are materializing the same frame multiple times in the case of many generator members in the same for body. E.g.

foo {
  for (i in someList) {
    i + 1
    i + 2
    i + 3
  }
}

And, according to the contract of the API, this allocates a new frame. So, let's guard against it; something like this should work:

diff --git a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java
index 03ae904c2..0bce42c8c 100644
--- a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java
+++ b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/ObjectData.java
@@ -24,6 +24,7 @@ import org.pkl.core.ast.member.ObjectMember;
 import org.pkl.core.runtime.VmObject;
 import org.pkl.core.runtime.VmUtils;
 import org.pkl.core.util.EconomicMaps;
+import org.pkl.core.util.Nullable;
 
 /** Data collected by {@link GeneratorObjectLiteralNode} to generate a `VmObject`. */
 public final class ObjectData {
@@ -36,12 +37,14 @@ public final class ObjectData {
   private final EconomicMap<Object, MaterializedFrame> generatorFrames;
   // The object's number of elements.
   private int length;
+  private @Nullable MaterializedFrame currentFrame;
 
   ObjectData(int parentLength) {
     // optimize for memory usage by not estimating minimum size
     members = EconomicMaps.create();
     generatorFrames = EconomicMaps.create();
     length = parentLength;
+    currentFrame = null;
   }
 
   UnmodifiableEconomicMap<Object, ObjectMember> members() {
@@ -56,6 +59,10 @@ public final class ObjectData {
     return generatorFrames.isEmpty();
   }
 
+  void resetForBindings() {
+    currentFrame = null;
+  }
+
   void addElement(VirtualFrame frame, ObjectMember member, GeneratorMemberNode node) {
     addMember(frame, (long) length, member, node);
     length += 1;
@@ -70,8 +77,11 @@ public final class ObjectData {
       CompilerDirectives.transferToInterpreter();
       throw node.duplicateDefinition(key, member);
     }
+    if (currentFrame == null) {
+      currentFrame = frame.materialize();
+    }
     if (node.isFrameStored) {
-      EconomicMaps.put(generatorFrames, key, frame.materialize());
+      EconomicMaps.put(generatorFrames, key, currentFrame);
     }
   }
diff --git a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java
index 865b6c632..b01afbe94 100644
--- a/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java
+++ b/pkl-core/src/main/java/org/pkl/core/ast/expression/generator/GeneratorForNode.java
@@ -68,7 +68,7 @@ public abstract class GeneratorForNode extends GeneratorMemberNode {
 
   @Override
   public final void execute(VirtualFrame frame, Object parent, ObjectData data) {
-    initialize(frame);
+    initialize(frame, data);
     executeWithIterable(frame, parent, data, iterableNode.executeGeneric(frame));
   }
@@ -164,7 +164,8 @@ public abstract class GeneratorForNode extends GeneratorMemberNode {
     }
   }
 
-  private void initialize(VirtualFrame frame) {
+  private void initialize(VirtualFrame frame, ObjectData data) {
+    data.resetForBindings();
     if (unresolvedKeyTypeNode != null) {
       CompilerDirectives.transferToInterpreterAndInvalidate();
       var keySlot = frame.getFrameDescriptor().getNumberOfSlots();

Comment on lines +305 to +306
* Copies `numberOfLocalsToCopy` locals from `sourceFrame`, starting at `firstSourceSlot`, to
* `targetFrame`, starting at `firstTargetSlot`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copies `numberOfLocalsToCopy` locals from `sourceFrame`, starting at `firstSourceSlot`, to
* `targetFrame`, starting at `firstTargetSlot`.
* Copies {@code numberOfLocalsToCopy} locals from {@code sourceFrame}, starting at {@code firstSourceSlot}, to
* {@code targetFrame}, starting at {@code firstTargetSlot}.

Comment on lines +318 to +354
for (int i = 0; i < numberOfLocalsToCopy; i++) {
var sourceSlot = firstSourceSlot + i;
var targetSlot = firstTargetSlot + i;
// If, for a particular call site of this method,
// slot kinds of `sourceDescriptor` will reach a steady state,
// then slot kinds of `targetDescriptor` will too.
var slotKind = sourceDescriptor.getSlotKind(sourceSlot);
switch (slotKind) {
case Boolean -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Boolean);
targetFrame.setBoolean(targetSlot, sourceFrame.getBoolean(sourceSlot));
}
case Long -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Long);
targetFrame.setLong(targetSlot, sourceFrame.getLong(sourceSlot));
}
case Double -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Double);
targetFrame.setDouble(targetSlot, sourceFrame.getDouble(sourceSlot));
}
case Object -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Object);
targetFrame.setObject(
targetSlot,
sourceFrame instanceof MaterializedFrame
// Even though sourceDescriptor.getSlotKind is now Object,
// it may have been a primitive kind when `sourceFrame`'s local was written.
// Hence we need to read the local with getValue() instead of getObject().
? sourceFrame.getValue(sourceSlot)
: sourceFrame.getObject(sourceSlot));
}
default -> {
CompilerDirectives.transferToInterpreter();
throw new VmExceptionBuilder().bug("Unexpected FrameSlotKind: " + slotKind).build();
}
}
}
Copy link
Contributor

@bioball bioball Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: this comment:

Even though sourceDescriptor.getSlotKind is now Object,
it may have been a primitive kind when sourceFrame's local was written.
Hence we need to read the local with getValue() instead of getObject().

I wonder if there is a deeper issue with how we are using frame descriptors. Right now, our WriteFrameSlotNode will update the frame descriptor, but I'm not sure if that's totally right. For example, com.oracle.truffle.api.impl.FrameWithoutBoxing#clear doesn't update the frame descriptor either (see https://graalvm.slack.com/archives/CNQSB2DHD/p1675722351829269 for more details).

In any case, you can avoid the weird edge case by looking at the frame slot tag instead:

Suggested change
for (int i = 0; i < numberOfLocalsToCopy; i++) {
var sourceSlot = firstSourceSlot + i;
var targetSlot = firstTargetSlot + i;
// If, for a particular call site of this method,
// slot kinds of `sourceDescriptor` will reach a steady state,
// then slot kinds of `targetDescriptor` will too.
var slotKind = sourceDescriptor.getSlotKind(sourceSlot);
switch (slotKind) {
case Boolean -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Boolean);
targetFrame.setBoolean(targetSlot, sourceFrame.getBoolean(sourceSlot));
}
case Long -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Long);
targetFrame.setLong(targetSlot, sourceFrame.getLong(sourceSlot));
}
case Double -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Double);
targetFrame.setDouble(targetSlot, sourceFrame.getDouble(sourceSlot));
}
case Object -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Object);
targetFrame.setObject(
targetSlot,
sourceFrame instanceof MaterializedFrame
// Even though sourceDescriptor.getSlotKind is now Object,
// it may have been a primitive kind when `sourceFrame`'s local was written.
// Hence we need to read the local with getValue() instead of getObject().
? sourceFrame.getValue(sourceSlot)
: sourceFrame.getObject(sourceSlot));
}
default -> {
CompilerDirectives.transferToInterpreter();
throw new VmExceptionBuilder().bug("Unexpected FrameSlotKind: " + slotKind).build();
}
}
}
for (var i = 0; i < numberOfLocalsToCopy; i++) {
var sourceSlot = firstSourceSlot + i;
var targetSlot = firstTargetSlot + i;
var slotKind = FrameSlotKind.fromTag(sourceFrame.getTag(sourceSlot));
switch (slotKind) {
case Boolean -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Boolean);
targetFrame.setBoolean(targetSlot, sourceFrame.getBoolean(sourceSlot));
}
case Long -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Long);
targetFrame.setLong(targetSlot, sourceFrame.getLong(sourceSlot));
}
case Double -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Double);
targetFrame.setDouble(targetSlot, sourceFrame.getDouble(sourceSlot));
}
case Object -> {
targetDescriptor.setSlotKind(targetSlot, FrameSlotKind.Object);
targetFrame.setObject(targetSlot, sourceFrame.getObject(sourceSlot));
}
default -> {
CompilerDirectives.transferToInterpreter();
throw new VmExceptionBuilder().bug("Unexpected FrameSlotKind: " + slotKind).build();
}
}
}

Copy link
Contributor Author

@odenix odenix Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I do see here, though, is that we are materializing the same frame multiple times in the case of many generator members in the same for body

Multiple calls to frame.materialize() are guaranteed to return the same materialized frame (which is a reference to the mutable virtual frame, not an immutable snapshot thereof). This seems important for Pkl performance because otherwise, every new X {...} expression within the same root node would create a new materialized frame.

Right now, our WriteFrameSlotNode will update the frame descriptor, but I'm not sure if that's totally right.

WriteFrameSlotNode updating the frame descriptor is how type profiling of locals works in Truffle. (The goal is to avoid boxing of primitives, esp. in interpreted code. I don't know if this profiling, which isn't free, improves Pkl real-world performance.) Note that the slot kind can only change from primitive type to Object, i.e., it can only become more general.

In any case, you can avoid the weird edge case by looking at the frame slot tag instead:

I considered this but concluded it's incorrect. Source frames that share the same descriptor may be copied in an order that differs from the order they were executed/profiled in. If we ignore the source descriptor's slot kind during copying, the target descriptor's slot kind isn't guaranteed to reach a steady state. For example, long-long-long-Object-Object-Object may turn into Object-long-Object-long-Object-long. As far as I understand, this isn't desirable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple calls to frame.materialize() are guaranteed to return the same materialized frame (which is a reference to the mutable virtual frame, not an immutable snapshot thereof).

I can't find any docs that guarantee this. Can you provide a reference?

Their own docs say it returns a new frame, even if the implementation disagrees.

I considered this but concluded it's incorrect. Source frames that share the same descriptor may be copied in an order that differs from the order they were executed/profiled in. If we ignore the source descriptor's slot kind during copying, the target descriptor's slot kind isn't guaranteed to reach a steady state. For example, long-long-long-Object-Object-Object may turn into Object-long-Object-long-Object-long. As far as I understand, this isn't desirable.

Ah, I see. That makes sense!

In that case, I don't understand this ternary:

sourceFrame instanceof MaterializedFrame
    ? sourceFrame.getValue(sourceSlot)
    : sourceFrame.getObject(sourceSlot))

Why is it safe to call sourceFrame.getObject if it is not a MaterializedFrame? And, in practice, it's always true; FrameWithoutBoxing implements MaterializedFrame.

Copy link
Contributor Author

@odenix odenix Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any docs that guarantee this. Can you provide a reference?

No, but I'm sure it's true. In interpreted code, frame.materialize() always returns frame. In Graal JITted code, frame.materialize() always yields the same MaterializedFrame instance. Feel free to double-check by asking in Truffle Slack.

Why is it safe to call sourceFrame.getObject if it is not a MaterializedFrame?

Because then sourceFrame is a VirtualFrame, which means that copyLocals is called while sourceFrame is active (a VirtualFrame may only be used within RootNode.execute), which guarantees that sourceFrame.getDescriptor accurately describes sourceFrame.

And, in practice, it's always true;

It's always true in interpreted code, where frames are regular Java objects allocated on the Java heap. Once Graal JIT kicks in, VirtualFrame is optimized away (that's what makes it "virtual").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I'm sure it's true. In interpreted code, frame.materialize() always returns frame. In Graal JITted code, frame.materialize() always yields the same MaterializedFrame instance. Feel free to double-check by asking in Truffle Slack.

Yup, seems like you're right; reference: https://graalvm.slack.com/archives/CNQSB2DHD/p1735839284090239

Because then sourceFrame is a VirtualFrame, which means that copyLocals is called while sourceFrame is active (a VirtualFrame may only be used within RootNode.execute), which guarantees that sourceFrame.getDescriptor accurately describes sourceFrame.

I think I'm missing something here? I can cause an active frame's descriptor to disagree with the frame slot's values, e.g.

  execute(VirtualFrame frame) {
    frame.getFrameDescriptor().setSlotKind(0, FrameSlotKind.Object);
    frame.setBoolean(0, true);
  }

I can this pass this frame to copyLocals. How do we have guarantees that the descriptor accurately describes the frame?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Late-bound values of iteratees within nested for/spread fail to resolve for-generator variables
2 participants