-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error handler trait #428
Error handler trait #428
Conversation
/// Called with the lexers filename | ||
fn lexer_path(&mut self, filename: &Path); | ||
/// Called with the lexers source contents. | ||
fn lexer_src(&mut self, src: &str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing that just came to mind is we could consider adding here is fn generated_src(&mut self, src:&str)
?
That might be useful for nimbleparse_lsp, so it could try to pass off generated src to rust-analyzer to typecheck it. (I'm not entirely sure how to do that at the moment though)
fn lexer_src(&mut self, src: &str); | ||
fn on_lex_build_error(&mut self, errors: Box<[LexBuildError]>); | ||
fn missing_in_lexer(&mut self, missing: &HashSet<String>); | ||
fn missing_in_parser(&mut self, missing: &HashSet<String>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth thinking about whether we can do better than just HashSet<String>
for missing_in_*
.
I.e. for missing_in_lexer
, we can perhaps give a PIdx
to the parsers production referencing the token.
It would probably need to be given a YaccGrammar
, and a YaccGrammarAST
to go from there to spans.
I think this is okay to lrlex
, because it depends on lrpar
, and overall is where this check occurs currently.
Edit: Not sure the above is currently actionable with the current signature of set_rule_ids
though.
And for missing_in_parser
, we have LRNonStreamingLexerDef::get_rule_by_name
, but we aren't passing in a LRNonStreamingLexerDef
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth thinking about whether we can do better than just HashSet for missing_in_*.
Yes I tend to agree.
Not sure the above is currently actionable with the current signature of set_rule_ids though.
I also think I agree with that. I don't immediately have a good idea as to whether we can come up with a better API for set_rule_ids
or not, but I don't object to changing it, because it's not a function that we expect many people to use directly.
/// - missing_in_lexer is called. | ||
/// - missing_in_parser is called. | ||
/// - on_lex_build_error is called. | ||
fn results(&self) -> Result<(), Box<dyn Error>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Result<(), Box<dyn Error>>
return value isn't currently very useful,
It's only called in cases it must return an Err
, and since the on_*
is never called when it is allowed to return Ok
It never has anything to do further analysis on, so it can't currently introduce it's own errors above and beyond the current error checking. Perhaps we could consider adding a fn analyse(...)
, It could be a method provided by the trait with an empty implementation even, which trait implementations can override.
Finally a call to results()
at the end even if nothing lrpar
considers an error has occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain that I understand what this function does and why it does something distinct in the API. Is it because the other methods are expected to do some sort of mutation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, self
is expected to basically collect the previous calls into some type T
fully contained within self.
The trait exists to basically hide that T
from lrpar.
In the ariadne
trait implementation this is a vector of ariadne::Report
s, in the simpler test impl, it is a String
and the result()
function calls into()
on it. for Sarif i'm expecting it to be a type that will be serializable to that format.
It is IMO not the prettiest thing, but I was running short on nicer ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought was that results
might want to take self
(rather than &self
) to make it clear that it's the "last" thing. But I'm not quite sure that's possible with Rust's type system in this case.
But then I wondered: do we even need such a "final" function? Can't whoever implements this trait summarise things on their own, without this? In other words, if they want to track whether there are any warnings/errors they can have (say) a bool=false
and set it to true
at the start of each implementation of a method in this trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the key to things like Safir
output is that self is not consumed by this call.
If you look at the last line in this test, it printlns some stuff out of the ErrorHandler
, after calls to build()
.
https://github.com/ratmice/error_callbacks_test/blob/implement_as_trait/build.rs#L287
We could definitely get away without result
, but it means things like build.rs
need to have some code which reaches into the type and prints warnings and errors, rather than only needing to do so in special cases like where you want to convert it to a specific format. That was my reasoning for adding results()
, and having it take a reference.
Edit: I removed the following from this comment, (I'm not sure whether dry_run
should return an error like build()
or just collect stuff into an ErrorHandler
🤷
my hope was that we could introduce a call like dry_run
, which mirrors build()
function but doesn't perform the filesystem operations, from which can pick up the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing it is doing is delegating the responsibility of turning concrete e.g. YaccGrammarError types, into the abstract Box, the return type of build()
Ah, so I got the wrong end of the stick (as is not uncommon!). So is what you're proposing a sort-of ErrorMap
trait in the sense of "we give you, the external implementer an error, you map
(in the sense of iter().map(|| ...)
it into whatever you want, then we bundle all of those mapped things up and return them"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a good way of putting it. I lean towards that being the extent of what it should be doing since trying to do more seems just seems to make things complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, got it!
So what I'll do is explain what I thought you were proposing, then I'll go away and chew over which I think might be best!
What I thought was going to happen is that we provide this trait and as we find a warning/error we call the appropriate function. The user can then do various things: they can, for example, print things out, collate them to do something with later, or whatever. But as soon as we've called that function, our responsibility for that particular warning/error is over (i.e. we don't collate them).
So, for example, CTBuilder
might choose to collate everything, and then print it out at when build
is called. In contrast, an editor might try displaying popups immediately to a user while analysis (etc.) continues in a background thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, originally it was kind of trying to do both things, e.g. it has to sort of provide the map
because build()
has to return something, in theory it could just return a token indicating ErrorsOccurred
, and beyond that expect the trait to do what it does. But also it has to dectect cargo
and print things using a cargo in-band cargo:warning. So if we were to delegate all that to the trait, it is a bit of build script environment variable testing and special casing that each trait needs to implement when called from build()
.
The above is what probably kept me from further going down that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing about e.g. pop-up alerts, I would note that this is still possible with the map
centric approach, but it must act in batches, and require the outside to implement the pop-up behavior.
struct MyErrors {
errors: Vec<YaccGrammarError>,
warnings: Vec<YaccGrammarWarning>,
}
impl ErrorMapper for MyErrors {
fn on_grammar_error(&mut self, error: YaccGrammarError) -> Box<dyn Error> {
self.errors.push(error);
// this perhaps would need to Box::new()?
error.into()
}
}
then from build.rs:
let _ = ct.error_mapper(myerrors).build();
for e in my_errors {
popup_alert(e);
}
Something like that anyways.
{ | ||
fn grammar_src(&mut self, src: &str); | ||
fn grammar_path(&mut self, path: &Path); | ||
fn warnings_are_errors(&mut self, flag: bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing missing here is that warnings are no longer printed, but there is no mechanism to print them from the trait.
there is also the show_warnings
flag of CTParserBuilder
we need to add if traits should print warnings.
One thing to consider is that we could change the Ok(())
value of results()
to return formatted warnings,
CTParserBuilder
then just prints them out to stderr
, or perhaps some emit_warnings()
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing missing here is that warnings are no longer printed, but there is no mechanism to print them from the trait.
The trait doesn't mind if you print or don't :) I have a couple of possible ideas here:
- The default implementation of the trait provided by the
CTBuilder
s is that they print warnings and/or panic pretty much as they do now. - We can provide an implementation of the trait which behaves as it does now but before it prints/panics also calls a user implementation of the trait.
Of these, 1) is the easiest to implement and to get right, so I'm inclined to start there, assuming we think it makes sense!
I am broadly in favour of this line of attack: thanks! This part of grmtools is currently undercooked, and I must admit that I had never really worked out in my mind what a better way of doing things might be. I think the Here are some half-formed general questions I have:
My immediate reaction is that I think I like the sound of: incremental; warnings are separate but can be part of one trait; and "analysis" doesn't need separating from errors/warnings. But my immediate reactions rarely stand up to long-term scrutiny, so don't read too much into that :) |
Being fully incremental might require some work, We'd need to make a trait for e.g. The tension being that cfgrammar wants to parse produce as many errors as it can, even after previous errors have occurred. |
Ah, sorry, I wasn't very clear. So we can be "incremental" (in the sense I intended, but didn't clarify, oops!) API without really being very incremental in the way we use the API (at least yet). But I've realised my question wasn't very well formed. The API you're proposing is inherently incremental (in the sense that it's not a return type for |
Sorry I haven't answered all of your questions, I want to think about them and don't feel I have clear answers without tinkering with things. |
The feeling I'm getting is that, Which would entail making it separate and having multiple analysis passes, presumably feeding into an In counter fashion I find it difficult to see how we can get the very detailed error messages from conflicts if it is going through a generic analysis interface. It seems like an analysis/error handler could be pretty difficult to get right with both ideas as abstract as they are. I did a little reading of the Sarif specification today, to get an idea of how it handles these kinds of things.
So essentially it is linking spans directly into format strings, and using markdown links to associate span information into the message. While that would allow us to encode the current forms of analysis we're doing with one interface between |
I kind of feel like it is fine to keep them in one trait, One thing to note with warnings is that the printing of them from within cargo is a bit strange. Because cargo generally wants to suppress stderr. So I think I'd like to try the simplest thing I can imagine for warnings, changing the
Edit: Or That way we can keep when/how to actually print them specific to Do you think that is generally worth trying? |
So I would like to focus for a minute on ways to integrate this in an 'imperfect' fashion without hamstringing ourselves to an imperfect trait.
I think a plan of this sort would allow us to migrate the code to the new trait piecemeal, without having to try getting it perfect on the first try, and getting some real experience using it internally within the codebase, before having to worry about stablizing it. Curious if you think this sounds like a viable plan? |
So I've been chewing for a bit, and my current thinking is that the "map" approach does, by its nature, lock in a fairly specific kind of behaviour. In many cases it's a very reasonable style of behaviour, but it seems to me that we have a more general kind of behaviour at our fingertips, and might as well see if we can work out exactly what that is. [Part of the reason I'm inclined towards the more general approach is that whatever we do now is an API break. Since grmtools current approach to reporting errors was always short-termist, that break is not just OK, but I've always been expecting it to come. That said, ideally, we'd like to do fewer breaks in the future to the error reporting API because we hope that we're able to soon iterate towards an approach we can more-or-less stick to for the long term.] Of course, maybe the "map" approach will end up being the best --- we can always come back to it if so! Bringing everything together, I think we can see a possible high-level design principle:
The obvious question then becomes: how does that play out in practise? I think we can see something else emerge from our discussions: there are various layers of errors / warnings / analysis. Fortunately (AFAICS at the moment, at least!) these can compose. So, for example we might have a couple of traits (as you can see, I don't know what good names are!): trait RTParserBuilderErrorWarningAnalysisTraitThing {
/// Report a syntax error in the input grammar.
fn syntax_error(...);
}
trait CTParserBuilderErrorWarningAnalysisTraitThing: RTParserBuilderErrorWarningAnalysisTraitThing {
/// The grammar is referencing tokens that aren't in the lexer.
fn missing_in_lexer(...);
} Both Because these things compose, struct ErrorWarningWrapper {
user_impl: Box<dyn RTParserBuilderErrorWarningAnalysisTraitThing>,
errors_seen: Vec<Box<dyn Error>>
}
impl RTParserBuilderErrorWarningAnalysisTraitThing for ErrorWarningWrapper {
fn syntax_error(...) {
self.errors_seen.push(...);
self.user_impl.syntax_error(...);
}
} and then Please feel free to tell me that I've lost the plot :) You've thought about this more than I have! |
That seems totally reasonable to want to explore other options, I would say the "map" approach is my best attempt so far with a focus on compatibility. That is it merely adds new optional behavior, and allows for some internal reorganization that didn't affect API but does allow us to remove some duplicated code across a few crates. I'm happy to look at implementations that focus on things like incremental reporting of errors that will require incompatible changes.
I think another term for this could be an error
I believe that is going to be all the issues we should expect to encounter (maybe an instance of each across the crates). |
That's a good point and something I might have underplayed. Though this PR does add new
"Streaming errors/warnings/etc" (is there a term that covers "errors/warnings/etc"? I can't think of one off-hand!) is certainly a better term than "incremental".
I think there's a good implicit question here which is: how far should we push streaming errors? We could say "we only really care about RTBuilder/CTBuilder", and as long as they present a nice API we don't mind too much if other parts of grmtools have a slightly cruder error/warnings API. This would probably be a good intermediate step, full stop, and perhaps might be the most pragmatic option anyway?
Right, so the streaming error API would probably need some sort of helper methods (or sometimes maybe parameters) that allow it to be told such information. I think some implementations will largely ignore this information, but some will want every last bit of it!
I think this an instance of the first problem in disguise? Maybe I'm not noticing an important detail though! Thanks for bearing with me on this. I know it's challenging when good work receives a barrage of questions! But your PRs really have made me think and I'm very grateful that you're willing to bear with me while we work out what to do next! |
It's definitely dependent on the first problem, I think the important question here is what do we prioritize between reducing error spam, and streaming errors upon discovery.
In this case the errors aren't created in a single pass e.g. given the following:
we There are a couple of ways we could solve it, depending upon which which decide has priority.
If it helps, I actually think we aren't too far off from e.g. a fully streaming errors code-wise as the crow flies, but to me the important question is whether it is reasonable from an API compatibility perspective, but I think it is worth making a good faith effort. |
Good point! I think I'm OK with shoving off to the implementer of the error trait (we could, if we wanted, even provide a de-duper composition trait, but that's a strictly optional extra). That would simplify what we do internally sufficiently to unblock this general idea I think?
Thank you -- and I agree! |
So, today I focused on whether or not it is possible to implement this in any roughly compatible way, and it seems like it is possible I changed one (rarely used method) from returning a I've now pushed this to https://github.com/ratmice/grmtools/tree/error_channels I think there are a few rough edges, revolving around mixing the channel approach with |
I think this PR might also be in a state to be closed? |
Here is the best attempt I think I can muster for implementing the patch in #426 as a trait instead of plain callback functions,
I figured it might be better to push it to a separate branch since @DavidHVernon has been doing some test on that, and this switches the API quite a bit.
This allows for the
ErrorHandler
to be owned by the outer thing, so values in it'sSelf
type can escape. So that is the benefit over the plain callbacks approach which can only return via theBox<dyn Error>
.I've pushed a branch of
error_callbacks_test
repo which implement this traitFirst a very quick/dirty implementation using
String
as a backing store.https://github.com/ratmice/error_callbacks_test/blob/implement_as_trait/build.rs#L273-L287
The second one is much nicer using the
ariadne
crate to produce nice messages.https://github.com/ratmice/error_callbacks_test/tree/ariadne_trait_impl
In the second (the work in progress ariadne errors) these errors look something like:
It is outside the repo because it's a failing build.rs, and ctfails tests can't quite handle it.
I was hoping that this would make calling it easier than having a bunch of individual functions for each callback.
It sort of succeeds from the
CT*Builder
perspective, but the lifetimes semantics require a few differences from the normal build.rs e.g. amove |...|
closure tolrpar_config
. This is whyLexErrorHandler
can be a&mut
, whileGrammarErrorHandler
has to be theRc<RefCell<_>>
.I believe that I am happy with this approach in general, I think it can both be used by tools like
nimbleparse
to cut down some code duplication internally (But I don't try to do that here), and useful for outside projects (Like custom error printing, and my future project of producing SARIF output). As a trait these things should be easier to put into a crates and used.With the holidays coming up, I'm not certain how quickly I will be able to respond to any requested changes.
If you don't happen to get to it before then either, I wish you happy holidays.