Replies: 8 comments 10 replies
-
@enebo Yeah this is a great question, and something we've been trying to figure out for the compilation stage in CRuby as well. At the moment we're doing something a little hokey where we use I don't love this solution but don't have a better one yet. The issue is we don't want to throw line numbers onto every token and node, as it would add quite a bit of memory when we don't actually need them in that many places. So still trying to figure out the right solution here. |
Beta Was this translation helpful? Give feedback.
-
@kddnewton yeah I cannot really say I fully remember the logic of when a node is marked newline but I think it is possible to only adorn the nodes which need them with this info. From a C struct side of things perhaps there are complications. I was going to chat with you about saving space on serialization by changing the fields stored from start/end offsets to start offset, length. I have some other ideas for reducing size on serialization side of things as well. |
Beta Was this translation helpful? Give feedback.
-
Err. I should have added a binary lookup of lines I think is fine for things like error reporting so not not all nodes may need line numbers (although if it works out that way then that is great). But error reporting is already a slow path activity so I am less concerned about it. |
Beta Was this translation helpful? Give feedback.
-
For the :line TracePoint and Coverage I guess? Any other need? We could also compute that on our own with an array I guess. That would likely need an extra tree walk through which feels suboptimal. Or maybe it could be done by Loader while deserializing given nodes are evaluated in post-order there. Would be heavy in offset->line conversion though, so seems better to serialize this info. |
Beta Was this translation helpful? Give feedback.
-
@eregon the most important one is just for :line and coverage. There are secondary ones like leaving line when exiting a module but those are not marked (although both MRI and JRuby store those end lines for that purpose). MRI already has a flags field on node and so it marks nodes as newline. A long time ago they used to use a full node but it is expensive space wise. JRuby has a newline boolean field on Node. Both impls store start line and column in the node. I would prefer at worst case serialize.c generates newline state and probably line number and more preferably we figure out a way for the parser to do it without wasting space. For error reporting where we need a line doing the binary lookup is no big deal but I don't think we want to do that search for every Ruby line in Java. A big draw of YARP is not needing stuff to warmup. The less bytecode we need the better. |
Beta Was this translation helpful? Give feedback.
-
Linking PR: #757. A lot of comments happened related to this topic there. |
Beta Was this translation helpful? Give feedback.
-
I updated description to match what I think we want to talk about. I am going to add some more details that were also mentioned in #757 about serialization. Some coverage needs an endLine. Those nodes are Module, Def, Defs, Lambda, Iter For, SClass, Class, PreExe (sorry JRuby node names 😄 ). This is static knowledge. For serialize, I personally would like line numbers put into these nodes. For compile? Line for serialization. I wonder if instead of stmts if this is potentially static as well. Are all CallNodes always a newline? If so, since serialize uses config.yml we can mark those as |
Beta Was this translation helpful? Give feedback.
-
This can be closed now in favor of #808 after having met about it. |
Beta Was this translation helpful? Give feedback.
-
[Note: I changed this to reflect discussions had in #757]
In all runtimes we need to know when to emit line number changes. Historically parsers would mark statements (
stmts
andtop_stmts
in CRuby/JRuby) with a flag indicating the node is a newline node. In compile.c or IRBuilder we track last emitted line number and if we see a newline node we make sure it is different than the last one and emit it if so.Currently YARP has no concept of newline nodes and it derives line from binary searching start_offset using a table of newlines (so it doesn't store line). The last PR opened for this would check this in all nodes (since it has no newline flag).
So both serialize and compile in YARP have an interest in figuring out when to emit lines and they can either share something built during parse or do something later.
Concerns:
a. memory of parser
b. time of binary search
Beta Was this translation helpful? Give feedback.
All reactions