Replies: 9 comments 4 replies
-
Please add links to the Issues and forks you are referring to. |
Beta Was this translation helpful? Give feedback.
-
I think I can do this with the current tooling, its a little contrived though. Essentially I'd define a small grammar that just parses out include statements, that could then be used to write a utility function that can take any source file and recursively identify, open and read those files, however deeply nested. That utility could then be used as the basis for class NestedTextReader that would expose a file and all associated includes, as a single stream, we'd pass that stream to the usual tokenizer. The Antlr lever would the just work, just read that stream, unaware there were any includes, it would see one large stream as if each include had been simply copied and pasted into one source file. In principle the NestedTextReader could be generated by Antlr, the user would supply a callback that implements the actual file IO. Such an approach would have no impact on the existing lexing/parsing logic, it would just make a set of nested included files appear like a single file to the lexer. |
Beta Was this translation helpful? Give feedback.
-
In my case I'm designing a grammar primarily based on PL/I (Subset G). This has gone very well, I can parse basic source now and have no reserved words, able to support keywords in different languages (English, Spanish etc) looking very good. Anyway PL/I typically includes files with |
Beta Was this translation helpful? Give feedback.
-
While I'm on this subject, does the code (in my case C# code) generated by Antlr always/only read the entire source in one hit with ReadAllLines? Can we get the lexer to read the source stream say line by line or char by char? if so, I could pass an |
Beta Was this translation helpful? Give feedback.
-
Seems in the case of C the My own language isn't C but I can certainly do includes that way too, on their own on a single line... |
Beta Was this translation helpful? Give feedback.
-
The file XXX
included like this in file YYY
would become
That's what the lexer would see, no probs, include file support without any changes to Antlr... |
Beta Was this translation helpful? Give feedback.
-
Actually this could be even simpler, just read source line by line, then if we see a line literally beginning |
Beta Was this translation helpful? Give feedback.
-
I'll write this tomorrow, could become a Antlr utility class, let me get it running in C# with my own language source files... |
Beta Was this translation helpful? Give feedback.
-
@kaby76 - Hi, OK I have implemented this for my C# target. It is a simple pattern actually. The pattern could - in principle - be added to Antlr where it generates the target lexer code. The pattern is that the consumer (the person leveraging the generated Antler code) must provide a callback method that accepts a string and uses that to open the file and return a private static TextReader ReadFileCallback (string Filename)
{
return File.OpenText($@"..\..\..\..\..\Antlr\{Filename}");
} Then I created a simple class that implements public class NestedSourceReader : TextReader
{
private string sourceFile;
private Func<string, TextReader> fileReader;
private Regex regex;
public NestedSourceReader(string SourceFile, string IncludePattern,Func<string,TextReader> FileReader)
{
fileReader = FileReader;
sourceFile= SourceFile;
regex = new Regex(IncludePattern);
}
public override string ReadToEnd()
{
StringBuilder builder = new StringBuilder();
append_stream(sourceFile, builder);
var txt = builder.ToString();
return txt;
/* Internal recursive method. */
void append_stream(string SourceFile, StringBuilder Builder)
{
var rdr = fileReader(SourceFile);
var line = rdr.ReadLine();
while (line != null)
{
if (regex.IsMatch(line))
{
// temp code to get the filename part, this could be handled generically...
var filename = line.Replace("#", "").Replace("include", "").Replace(";", "").Trim().TrimEnd('"').TrimStart('"');
append_stream(filename, builder);
}
else
{
builder.AppendLine(line);
}
line = rdr.ReadLine();
}
}
}
} Because that's a TextReader we can pass it to Antler's We create the NestedSourceReader reader = new NestedSourceReader("test_3.nr", "\\#include\\s*(<([^\"<>|\\b]+)>|\"([^\"<>|\\b]+)\")", ReadFileCallback); That regex is just a contrived one that recognizes the C Well it does work, Antler sees a single source text that is basically the source file with every It parses fine. There are two issues here though and that is, first, line numbering, the line numbers seen/reported/recorded by Antler in its tree or diagnostic messages reflects the location inside the fully expanded file, rather than the line number in the actual file itself. Second the regex test is weak because we could be reading a line that's part of a comment, only Antlr itself can distinguish. Now it was a comment we're likely fine because for that very reason, despite the file being expanded into the source text, it will be commented out. But for strings that contain text matching an include directive, this will break... Now this code could be tidied up a bit, for example rather than extracting the include file's name in the NestedSourceReader we could pass the entire line I doubt that an equivalent implementation for other languages is a big challenge too, so Java etc can readily to this. Does anyone have any thoughts on this at all? My own problem is more or less solved, but the line numbering is a minor hassle. |
Beta Was this translation helpful? Give feedback.
-
I've been looking at closed issues and fork activities, related to supporting include files, that is lexing/parsing code like C, C++, PL/I and so on, where source contains metadata referring to additional source.
I found several mentions, including a valiant attempt by a forker that looked substantial but never became a pull request.
What is the status, views, opinions of this feature? is it considered important? a throwback not valued in a modern grammar? what is the status?
Beta Was this translation helpful? Give feedback.
All reactions