Is it easy to recreate a source file from a parse tree? #4062

Korporal · 2023-01-06T23:42:02Z

Korporal
Jan 6, 2023

I was wondering if there's enough metadata in the parse tree to be able to recreate the original source text from it, right down to the same line numbers and offsets within lines, of the various tokens.

Could one recreate an exact replica of the original source file, from the tree?

kaby76 · 2023-01-07T02:33:49Z

kaby76
Jan 7, 2023

Yes, it's the entire character stream that any token points to is the text for the tree. For a sub-tree, you have two options.

Get the interval of the token indices of the sub-tree, then print out all tokens in the token stream in the range.
Get the interval of the char stream buffer for the sub-tree (start in the first token, end in the last token), and print out the entire string from the char buffer.

Having lexer rules marked "skip" can make text reconstruction impossible from just the token stream text because there won't be any tokens in the token stream for the string. You have to go to the char stream buffer.

1 reply

KvanTTT Jan 7, 2023

Having lexer rules marked "skip" can make text reconstruction impossible from just the token stream text because there won't be any tokens in the token stream for the string. You have to go to the char stream buffer.

That's why I always recommend using -> channel(HIDDEN) instead of skip since it preservers input into.

KvanTTT · 2023-01-07T14:22:46Z

KvanTTT
Jan 7, 2023

In theory it's possible but not ideally:

You have to implement hidden tokens handler (comments, whitespaces) to correctly match them with parse tree nodes. You can take a look at my old article about parsing of Objective-C preprocessor directives or at Roslyn.
If the input source contains syntax errors, ANTLR may build parse tree with missing nodes (error handling is not perfect).

0 replies

Korporal · 2023-01-07T15:58:32Z

Korporal
Jan 7, 2023
Author

Thanks guys, I will read up on this as suggested. FYI the motive here is to replace all keywords with their equivalent in some other language. The language I'm designing (named recently as "Imperium") makes it trivial to use keywords from any supported language like English, French, Spanish, Dutch etc.

I'd like to parse a source file that uses say English keywords and replace those with say Dutch and the generate a new source file that contains only the Dutch keywords.

This is not an important thing, but something that would be useful to have around, certainly for testing but also as a useful tool when working with the language.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it easy to recreate a source file from a parse tree? #4062

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Is it easy to recreate a source file from a parse tree? #4062

Korporal Jan 6, 2023

Replies: 3 comments · 1 reply

kaby76 Jan 7, 2023

KvanTTT Jan 7, 2023

KvanTTT Jan 7, 2023

Korporal Jan 7, 2023 Author

Korporal
Jan 6, 2023

Replies: 3 comments 1 reply

kaby76
Jan 7, 2023

KvanTTT
Jan 7, 2023

Korporal
Jan 7, 2023
Author