Replies: 5 comments
-
I don't have a big opinion here since I think all of them use flags so I don't think this will use more space to consolidate these types. I have also already implemented them so this is more work (but very very little extra work). This issue made me go back and audit the D*Node triplet and they are effectively the same code. Here is the change I just made: Operand buildDStr(Variable result, U[] nodePieces, Encoding encoding, boolean isFrozen, int line) {
if (result == null) result = temp();
Operand[] pieces = new Operand[nodePieces.length];
int estimatedSize = 0;
for (int i = 0; i < pieces.length; i++) {
estimatedSize += dynamicPiece(pieces, i, nodePieces[i]);
}
addInstr(new BuildCompoundStringInstr(result, pieces, encoding, estimatedSize, isFrozen, getFileName(), line));
return result;
}
Operand buildDSymbol(Variable result, U[] nodePieces, Encoding encoding, int line) {
return copy(new DynamicSymbol(buildDStr(result, nodePieces, encoding, false, line)));
}
public Operand buildDXStr(Variable result, U[] nodePieces, Encoding encoding, int line) {
return fcall(result, Self.SELF, "`", buildDStr(result, nodePieces, encoding, false, line));
} I guess I wonder now why frozen is false and not true for dsym and dxstr . Since the string is only an intermediate used internally it probably doesn't matter but it does lead me to your main dialogue... If these type are only non-interpolated or immediate values (fixnum, bool, ...) then perhaps we do not need to construct a RubyString at all. We do for DStr case but even then we might be able to remove the dynamic aspect and create a single string during build/compile. Merging these to a single node type does not eliminate doing this sort of optimization but perhaps people will have a harder time seeing the possibility. Note: As for source() we construct a new string per call to it. It appears to return a modifiable string so that seems to need to be at least dupd ( So the point of the last paragraph goes beyond making a static value from only non-interpolating strings to say we can potentially make these nodes static if we have results which are known to be representable without making a ruby string for them first. This I think is pretty rare so I doubt it is worth it. Still something possible if we are talking about optimizations. From JRuby perspective unless I change old parser to do the same thing then I will have these two extra tiny methods. I don't care either way. I have resisted moving older parser towards prism just because it is simpler to compare with MRI's parser but at some point I will probably start pushing on the tree so the building code keeps getting more in sync. A case for this change is if you look at DSymbolNode in JRuby you will see it is extends DNode and there is nothing in it but require extendable methods which more or less are just indicating a type. It is largely similar to being a flag but because it is the node type we do not have an extra if statement (DNode would have extra conditional in processing that does not exist as a type). That is an extremely minor concern 😄 |
Beta Was this translation helpful? Give feedback.
-
I'm not very into the idea of adding an extra node in the simple case. I'm assuming/hoping that the majority of use cases are going to be the simple option, in which case it would be adding a lot of nodes for not a lot of gain. That being said, I'm sympathetic to what you're saying here. I actually have been having some thoughts lately that The way I see it, we have a couple of options:
As I said before, I'm not particularly keen on options 1 or 6. I think 2 is probably the simplest, but it's frustrating that we would be requiring all of our consumers to scan for interpolated content. 3 would be okay, but the naming is still confusing. 4 gets us closer to actually being intuitive naming. 5 is probably the most work, but every node can be consistently compiled/handled, so no surprise that's my favorite option. Thoughts? cc @jemmaissroff, @eregon, @enebo, @seven1m |
Beta Was this translation helpful? Give feedback.
-
@kddnewton The naming changes doesn't bother me at all. Interpolated being MultiPart really makes no difference to me but I can see the term interpolated will be bothering some subset of people. If you want to eliminate somewhat rare complaints that an non-interpolating string is marked as interpolating then you will never have to hear someone mention it again. I am ok with option 5 but with #1799 I am wondering why not boil the ocean and combine all of these nodes during parse if there is no interpolation? Is it just the complexity of taking pieces and then when you realize there is no interpolation you have to make a different type? Or is this the problem that we lose information about syntax? From an compilation/build perspective preserving MultipartString in the tree vs making non-interpolating into a String is adding time and space we don't really want. |
Beta Was this translation helpful? Give feedback.
-
@enebo The issue is if we combined everything during parsing, there would be no way to round-trip. We lose the location information of the individual strings, and formatters/linters have no way of getting it back without referencing the source. |
Beta Was this translation helpful? Give feedback.
-
@kddnewton ok. Info loss is a problem so long as parsing will only create a single universal tree. If we ever have more info-losing needs then we might want to consider splitting the tree between semantic and syntactic. |
Beta Was this translation helpful? Give feedback.
-
I have been thinking about e.g. InterpolatedXStringNode and InterpolatedRegularExpressionNode, etc.
Those are basically the same as doing something on top of InterpolatedStringNode.
So another way to represent the AST would be:
The same for RegularExpressionNode and SymbolNode and InterpolatedMatchLastLineNode.
One concern might be that this effectively encourages creating a Ruby String for those nodes, even if the child is just a StringNode, but that seems needed anyway, for Regexp#source and Symbol#name, etc. If one cares they could create those lazily and instead only store the bytes or so and handle the case of the child being a StringNode differently than for InterpolatedStringNode.
Then XStringNode would mean just call Kernel#` with the string, which seems nice and clean.
OTOH there would likely be
if
conditions to handle e.g. a RegularExpressionNode as if it's not interpolated then the result can be cached in the AST.And also one extra node for the not-interpolated case so that's probably a good argument against.
Just a suggestion, WDYT?
Beta Was this translation helpful? Give feedback.
All reactions