Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handler trait #428

Closed
wants to merge 8 commits into from
Closed

Conversation

ratmice
Copy link
Collaborator

@ratmice ratmice commented Dec 15, 2023

Here is the best attempt I think I can muster for implementing the patch in #426 as a trait instead of plain callback functions,
I figured it might be better to push it to a separate branch since @DavidHVernon has been doing some test on that, and this switches the API quite a bit.

This allows for the ErrorHandler to be owned by the outer thing, so values in it's Self type can escape. So that is the benefit over the plain callbacks approach which can only return via the Box<dyn Error>.

I've pushed a branch of error_callbacks_test repo which implement this trait
First a very quick/dirty implementation using String as a backing store.
https://github.com/ratmice/error_callbacks_test/blob/implement_as_trait/build.rs#L273-L287
The second one is much nicer using the ariadne crate to produce nice messages.
https://github.com/ratmice/error_callbacks_test/tree/ariadne_trait_impl

In the second (the work in progress ariadne errors) these errors look something like:

     ╭─[src/erroneous.y:4:1]
     │
   1 │ %token a b
     │        ┬
     │        ╰── Shifted
     │
   4 │ B -> (): a | a a {};
     │ ┬        ┬
     │ ╰─────────── Reduced rule
     │          │
     │          ╰── Reduced production
  ───╯

It is outside the repo because it's a failing build.rs, and ctfails tests can't quite handle it.
I was hoping that this would make calling it easier than having a bunch of individual functions for each callback.
It sort of succeeds from the CT*Builder perspective, but the lifetimes semantics require a few differences from the normal build.rs e.g. a move |...| closure to lrpar_config. This is why LexErrorHandler can be a &mut, while GrammarErrorHandler has to be the Rc<RefCell<_>>.

I believe that I am happy with this approach in general, I think it can both be used by tools like nimbleparse to cut down some code duplication internally (But I don't try to do that here), and useful for outside projects (Like custom error printing, and my future project of producing SARIF output). As a trait these things should be easier to put into a crates and used.

With the holidays coming up, I'm not certain how quickly I will be able to respond to any requested changes.
If you don't happen to get to it before then either, I wish you happy holidays.

/// Called with the lexers filename
fn lexer_path(&mut self, filename: &Path);
/// Called with the lexers source contents.
fn lexer_src(&mut self, src: &str);
Copy link
Collaborator Author

@ratmice ratmice Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that just came to mind is we could consider adding here is fn generated_src(&mut self, src:&str)?

That might be useful for nimbleparse_lsp, so it could try to pass off generated src to rust-analyzer to typecheck it. (I'm not entirely sure how to do that at the moment though)

lrlex/src/lib/ctbuilder.rs Outdated Show resolved Hide resolved
fn lexer_src(&mut self, src: &str);
fn on_lex_build_error(&mut self, errors: Box<[LexBuildError]>);
fn missing_in_lexer(&mut self, missing: &HashSet<String>);
fn missing_in_parser(&mut self, missing: &HashSet<String>);
Copy link
Collaborator Author

@ratmice ratmice Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth thinking about whether we can do better than just HashSet<String> for missing_in_*.

I.e. for missing_in_lexer, we can perhaps give a PIdx to the parsers production referencing the token.
It would probably need to be given a YaccGrammar, and a YaccGrammarAST to go from there to spans.
I think this is okay to lrlex, because it depends on lrpar, and overall is where this check occurs currently.

Edit: Not sure the above is currently actionable with the current signature of set_rule_ids though.

And for missing_in_parser, we have LRNonStreamingLexerDef::get_rule_by_name, but we aren't passing in a LRNonStreamingLexerDef here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth thinking about whether we can do better than just HashSet for missing_in_*.

Yes I tend to agree.

Not sure the above is currently actionable with the current signature of set_rule_ids though.

I also think I agree with that. I don't immediately have a good idea as to whether we can come up with a better API for set_rule_ids or not, but I don't object to changing it, because it's not a function that we expect many people to use directly.

/// - missing_in_lexer is called.
/// - missing_in_parser is called.
/// - on_lex_build_error is called.
fn results(&self) -> Result<(), Box<dyn Error>>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Result<(), Box<dyn Error>> return value isn't currently very useful,
It's only called in cases it must return an Err, and since the on_* is never called when it is allowed to return Ok
It never has anything to do further analysis on, so it can't currently introduce it's own errors above and beyond the current error checking. Perhaps we could consider adding a fn analyse(...), It could be a method provided by the trait with an empty implementation even, which trait implementations can override.

Finally a call to results() at the end even if nothing lrpar considers an error has occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that I understand what this function does and why it does something distinct in the API. Is it because the other methods are expected to do some sort of mutation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, self is expected to basically collect the previous calls into some type T fully contained within self.
The trait exists to basically hide that T from lrpar.

In the ariadne trait implementation this is a vector of ariadne::Reports, in the simpler test impl, it is a String and the result() function calls into() on it. for Sarif i'm expecting it to be a type that will be serializable to that format.

It is IMO not the prettiest thing, but I was running short on nicer ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was that results might want to take self (rather than &self) to make it clear that it's the "last" thing. But I'm not quite sure that's possible with Rust's type system in this case.

But then I wondered: do we even need such a "final" function? Can't whoever implements this trait summarise things on their own, without this? In other words, if they want to track whether there are any warnings/errors they can have (say) a bool=false and set it to true at the start of each implementation of a method in this trait?

Copy link
Collaborator Author

@ratmice ratmice Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the key to things like Safir output is that self is not consumed by this call.
If you look at the last line in this test, it printlns some stuff out of the ErrorHandler, after calls to build().

https://github.com/ratmice/error_callbacks_test/blob/implement_as_trait/build.rs#L287

We could definitely get away without result, but it means things like build.rs need to have some code which reaches into the type and prints warnings and errors, rather than only needing to do so in special cases like where you want to convert it to a specific format. That was my reasoning for adding results(), and having it take a reference.

Edit: I removed the following from this comment, (I'm not sure whether dry_run should return an error like build() or just collect stuff into an ErrorHandler 🤷
my hope was that we could introduce a call like dry_run, which mirrors build() function but doesn't perform the filesystem operations, from which can pick up the results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing it is doing is delegating the responsibility of turning concrete e.g. YaccGrammarError types, into the abstract Box, the return type of build()

Ah, so I got the wrong end of the stick (as is not uncommon!). So is what you're proposing a sort-of ErrorMap trait in the sense of "we give you, the external implementer an error, you map (in the sense of iter().map(|| ...) it into whatever you want, then we bundle all of those mapped things up and return them"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is a good way of putting it. I lean towards that being the extent of what it should be doing since trying to do more seems just seems to make things complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, got it!

So what I'll do is explain what I thought you were proposing, then I'll go away and chew over which I think might be best!

What I thought was going to happen is that we provide this trait and as we find a warning/error we call the appropriate function. The user can then do various things: they can, for example, print things out, collate them to do something with later, or whatever. But as soon as we've called that function, our responsibility for that particular warning/error is over (i.e. we don't collate them).

So, for example, CTBuilder might choose to collate everything, and then print it out at when build is called. In contrast, an editor might try displaying popups immediately to a user while analysis (etc.) continues in a background thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally it was kind of trying to do both things, e.g. it has to sort of provide the map because build() has to return something, in theory it could just return a token indicating ErrorsOccurred, and beyond that expect the trait to do what it does. But also it has to dectect cargo and print things using a cargo in-band cargo:warning. So if we were to delegate all that to the trait, it is a bit of build script environment variable testing and special casing that each trait needs to implement when called from build().

The above is what probably kept me from further going down that path.

Copy link
Collaborator Author

@ratmice ratmice Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing about e.g. pop-up alerts, I would note that this is still possible with the map centric approach, but it must act in batches, and require the outside to implement the pop-up behavior.

struct MyErrors {
   errors: Vec<YaccGrammarError>,
   warnings: Vec<YaccGrammarWarning>,
}

impl ErrorMapper for MyErrors {
     fn on_grammar_error(&mut self, error: YaccGrammarError) -> Box<dyn Error> {
         self.errors.push(error);
         // this perhaps would need to Box::new()?
         error.into() 
     }
}

then from build.rs:

let _ = ct.error_mapper(myerrors).build();
for e in my_errors {
    popup_alert(e);
}

Something like that anyways.

{
fn grammar_src(&mut self, src: &str);
fn grammar_path(&mut self, path: &Path);
fn warnings_are_errors(&mut self, flag: bool);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here is that warnings are no longer printed, but there is no mechanism to print them from the trait.
there is also the show_warnings flag of CTParserBuilder we need to add if traits should print warnings.

One thing to consider is that we could change the Ok(()) value of results() to return formatted warnings,
CTParserBuilder then just prints them out to stderr, or perhaps some emit_warnings() call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here is that warnings are no longer printed, but there is no mechanism to print them from the trait.

The trait doesn't mind if you print or don't :) I have a couple of possible ideas here:

  1. The default implementation of the trait provided by the CTBuilders is that they print warnings and/or panic pretty much as they do now.
  2. We can provide an implementation of the trait which behaves as it does now but before it prints/panics also calls a user implementation of the trait.

Of these, 1) is the easiest to implement and to get right, so I'm inclined to start there, assuming we think it makes sense!

@ltratt
Copy link
Member

ltratt commented Dec 16, 2023

I am broadly in favour of this line of attack: thanks! This part of grmtools is currently undercooked, and I must admit that I had never really worked out in my mind what a better way of doing things might be. I think the trait approach pioneered here is broadly the right one. Of course, there are many minor details about the API that we can think about, but at the very least this feels like we're making a substantial move in the right direction.

Here are some half-formed general questions I have:

  1. Is/should the error API intended to be incremental? i.e. it calls the functions as soon as problems are detected? Or are they bundled up at the end (and/or in well-defined stages, in the sense of rustc, where type checking happens first and only when fully complete is the borrow checker run)?
  2. Should warnings be differentiated from errors? If so, should they be part of one trait or ... ?
  3. Should "analysis" be a separate thing than errors/warnings?

My immediate reaction is that I think I like the sound of: incremental; warnings are separate but can be part of one trait; and "analysis" doesn't need separating from errors/warnings. But my immediate reactions rarely stand up to long-term scrutiny, so don't read too much into that :)

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 16, 2023

Being fully incremental might require some work,

We'd need to make a trait for e.g. cfgrammar, to replace the vector here:
https://github.com/softdevteam/grmtools/blob/master/cfgrammar/src/lib/yacc/parser.rs#L306

The tension being that cfgrammar wants to parse produce as many errors as it can, even after previous errors have occurred.
So currently it stores those in a Vec and returns them all at once.

@ltratt
Copy link
Member

ltratt commented Dec 16, 2023

Being fully incremental might require some work,

Ah, sorry, I wasn't very clear. So we can be "incremental" (in the sense I intended, but didn't clarify, oops!) API without really being very incremental in the way we use the API (at least yet). But I've realised my question wasn't very well formed. The API you're proposing is inherently incremental (in the sense that it's not a return type for build or similar), which I think is good. So we can ignore that high-level question :)

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 16, 2023

Sorry I haven't answered all of your questions, I want to think about them and don't feel I have clear answers without tinkering with things.

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 16, 2023

  1. Should "analysis" be a separate thing than errors/warnings?

The feeling I'm getting is that, *ErrorHandler is in 1 to 1 correspondence to a RTParserBuilder or CTParserBuilder.
While analysis is in many to one, one of the things keeping missing_from_parser/lexer, and conflicts from being just forms of analysis is that 1 to 1 correspondence. So it might be good to try to replace those with analysis passes.

Which would entail making it separate and having multiple analysis passes, presumably feeding into an *ErrorHandler
But I'm still not really sure about the specific interfaces they might need to interact, like when an analysis pass succeeds with no errors, I assume you might want to record that?

In counter fashion I find it difficult to see how we can get the very detailed error messages from conflicts if it is going through a generic analysis interface. It seems like an analysis/error handler could be pretty difficult to get right with both ideas as abstract as they are.

I did a little reading of the Sarif specification today, to get an idea of how it handles these kinds of things.
I'm looking at the example on page 43 of https://docs.oasis-open.org/sarif/sarif/v2.1.0/sarif-v2.1.0.pdf
Which i'll snip some relevant bits out of:

{
   message": { "text": "Tainted data was used. The data came from [here](3)." }
   "relatedLocations": [{
         "id": 3,
         "physicalLocation": {
              "uri": "file:///C:/code/input.c",
              "region": {
                   "startLine": 25,
                   "startColumn": 19
              }
         }
    }]
}

So essentially it is linking spans directly into format strings, and using markdown links to associate span information into the message. While that would allow us to encode the current forms of analysis we're doing with one interface between GrammarErrorHandler, and all the analysis passes. It also seems like it would be pretty heavy-weight and entail a lot of dependencies or at least specialized string formatting code.

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 16, 2023

  1. Should warnings be differentiated from errors? If so, should they be part of one trait or ... ?

I kind of feel like it is fine to keep them in one trait,

One thing to note with warnings is that the printing of them from within cargo is a bit strange.
https://github.com/softdevteam/grmtools/blob/master/lrpar/src/lib/ctbuilder.rs#L455C44-L460

Because cargo generally wants to suppress stderr. So CTParserBuilder uses an environment variable to tell if
we are running under cargo, and then print them specially, so they get shown regardless of panic.
This seems to me like logic we probably don't want everywhere.

I think I'd like to try the simplest thing I can imagine for warnings, changing the results() signature to something like:

/// Under the following conditions this function must return an `Err`
/// ....
/// Otherwise an `Ok(vec!["Warnings", ...])` 
fn results(&self) -> Result<Vec<String>, Box<dyn Error>>

Edit: Or Option<Vec<String>> even to avoid the allocation?

That way we can keep when/how to actually print them specific to CTParserBuilder and each specific tool,
GrammarErrorHandler then doesn't have to worry about gaining a show_warnings flag, printing or not is up to the tool in question.
I am already imagining though that between Cargo and Ariadne, we might get double "Warning:" text, because both are prefixing their output.

Do you think that is generally worth trying?

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 17, 2023

So I would like to focus for a minute on ways to integrate this in an 'imperfect' fashion without hamstringing ourselves to an imperfect trait.

  • Include the trait as a Sealed trait trait
  • Implement 'DefaultGrammarErrorHandlerandDefaultLexerErrorHandler`s in lrpar and lrlex
  • Migrate tools like nimbleparse/lrlex/lrpar, CTParserBuilder, and RTParserBuilder to use those Default*Handlers internally.
  • Because the above cross crate boundries the trait methods must be callable by external crates so this would be a supertrait sealed trait in the sealed trait guide above. The methods would be documented unstable, and #[doc(hidden)]
  • We could even add something like #[cfg(feature="unstable_api")] pub struct UnstableAPI(); #[cfg(not(feature="unstable_api")] struct UnstableAPI(); argument to the methods, to indicate their instability in code. enabling or disabling the unstable_api feature, then turns these from supertrait sealed trait to method signature sealed trait making them uncallable without the unstable_api feature enabled. (Now that I think about feature additivity this may not actually work, it may see it as a multiply defined struct)
  • Once we feel like we have the trait method signatures in good working order and are happy with everything, we could unseal it and either add or unhide the error_handler in the Builders

I think a plan of this sort would allow us to migrate the code to the new trait piecemeal, without having to try getting it perfect on the first try, and getting some real experience using it internally within the codebase, before having to worry about stablizing it.

Curious if you think this sounds like a viable plan?

@ratmice ratmice mentioned this pull request Dec 18, 2023
@ltratt
Copy link
Member

ltratt commented Dec 19, 2023

So I've been chewing for a bit, and my current thinking is that the "map" approach does, by its nature, lock in a fairly specific kind of behaviour. In many cases it's a very reasonable style of behaviour, but it seems to me that we have a more general kind of behaviour at our fingertips, and might as well see if we can work out exactly what that is. [Part of the reason I'm inclined towards the more general approach is that whatever we do now is an API break. Since grmtools current approach to reporting errors was always short-termist, that break is not just OK, but I've always been expecting it to come. That said, ideally, we'd like to do fewer breaks in the future to the error reporting API because we hope that we're able to soon iterate towards an approach we can more-or-less stick to for the long term.] Of course, maybe the "map" approach will end up being the best --- we can always come back to it if so!

Bringing everything together, I think we can see a possible high-level design principle:

  • We want to report errors / warnings / analysis stuff to the user as we discover it. [What I previously called "incremental" though that word might be a bit loaded!]

The obvious question then becomes: how does that play out in practise? I think we can see something else emerge from our discussions: there are various layers of errors / warnings / analysis. Fortunately (AFAICS at the moment, at least!) these can compose. So, for example we might have a couple of traits (as you can see, I don't know what good names are!):

trait RTParserBuilderErrorWarningAnalysisTraitThing {
  /// Report a syntax error in the input grammar.
  fn syntax_error(...);
}

trait CTParserBuilderErrorWarningAnalysisTraitThing: RTParserBuilderErrorWarningAnalysisTraitThing {
  /// The grammar is referencing tokens that aren't in the lexer.
  fn missing_in_lexer(...);
}

Both RTParserBuilder and CTParserBuilder can (if we want!) provide default implementations.

Because these things compose, CTParserBuilder can, internally, provide its own implementation that wraps the user's version, so build can still return Err if there are errors. So CTParserBuilder might do something like:

struct ErrorWarningWrapper {
  user_impl: Box<dyn RTParserBuilderErrorWarningAnalysisTraitThing>,
  errors_seen: Vec<Box<dyn Error>>
}

impl RTParserBuilderErrorWarningAnalysisTraitThing for ErrorWarningWrapper {
  fn syntax_error(...) {
    self.errors_seen.push(...);
    self.user_impl.syntax_error(...);
  }
}

and then CTParserBuilder can check error_wrapper.errors_seen in build and report them back to the user (or just return Err("Some errors were seen") or whatever it chooses). In other words, the "standard" case would feel pretty much just like using grmtools today --- but you also have the ability to do fine-grained things with errors / warnings if you want (whether that's printing them to stderr, or popups, or feeding them to a later analysis or whatever).

Please feel free to tell me that I've lost the plot :) You've thought about this more than I have!

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 21, 2023

my current thinking is that the "map" approach does, by its nature, lock in a fairly specific kind of behaviour. In many cases it's a very reasonable style of behaviour, but it seems to me that we have a more general kind of behaviour at our fingertips, and might as well see if we can work out exactly what that is.

That seems totally reasonable to want to explore other options, I would say the "map" approach is my best attempt so far with a focus on compatibility. That is it merely adds new optional behavior, and allows for some internal reorganization that didn't affect API but does allow us to remove some duplicated code across a few crates. I'm happy to look at implementations that focus on things like incremental reporting of errors that will require incompatible changes.

Part of the reason I'm inclined towards the more general approach is that whatever we do now is an API break. Since grmtools current approach to reporting errors was always short-termist, that break is not just OK, but I've always been expecting it to come. That said, ideally, we'd like to do fewer breaks in the future to the error reporting API because we hope that we're able to soon iterate towards an approach we can more-or-less stick to for the long term.

Bringing everything together, I think we can see a possible high-level design principle:

  • We want to report errors / warnings / analysis stuff to the user as we discover it. [What I previously called "incremental" though that word might be a bit loaded!]

I think another term for this could be an error stream or channel perhaps.
I'm going to enumerate what I consider to be the main areas of difficulty implementing these (in no particular order)

  1. RTParserBuilder/CTParserBuilder currently receive buffered errors in batches and the user e.g. as a Vec<*Error>.
    So I think the streaming must go at a lower level, e.g. One of the main sources of errors is currently ending up here: https://docs.rs/cfgrammar/latest/src/cfgrammar/yacc/ast.rs.html#18 Such that we could consider that vec becoming the write end of a channel.
  2. At the origin where most errors occur I don't believe we have anything such as the yacc filename.y, so we need a way to tack this on in the builders, or otherwise pass it (or even just any form FileId ).
  3. add_duplicate_occurrence is in tension with streaming, because it is collating duplicate rules/names etc which require caching them and reporting them at the end to squash all duplicates to a single error.
    https://github.com/softdevteam/grmtools/blob/0f30331f14bbb0d353d533f16e8b2bebeb485a95/cfgrammar/src/lib/yacc/parser.rs#L272-L291C2

I believe that is going to be all the issues we should expect to encounter (maybe an instance of each across the crates).
It is definitely possible I've overlooked something though. One and two definitely have the most user visible impact on API (I think in this case ASTWithValidityInfo::new wants a channel and a some File Id or filename).
I'm guessing we'd want to remove 3 or move it out so it is something the read end of an error channel would do compacting duplicate errors?

@ltratt
Copy link
Member

ltratt commented Dec 21, 2023

That is it merely adds new optional behavior, and allows for some internal reorganization that didn't affect API but does allow us to remove some duplicated code across a few crates.

That's a good point and something I might have underplayed. Though this PR does add new pub traits so I think it does also extend the current user-visible API? That said, one thing that might be worth considering is using this PR (or something like it) as a stepping-stone for a minor release while we chug away on a bigger longer-term change for the next major release. I'm open to your thoughts on that!

I think another term for this could be an error stream or channel perhaps.

"Streaming errors/warnings/etc" (is there a term that covers "errors/warnings/etc"? I can't think of one off-hand!) is certainly a better term than "incremental".

RTParserBuilder/CTParserBuilder currently receive buffered errors in batches and the user e.g. as a Vec<*Error>.
So I think the streaming must go at a lower level, e.g. One of the main sources of errors is currently ending up here: https://docs.rs/cfgrammar/latest/src/cfgrammar/yacc/ast.rs.html#18 Such that we could consider that vec becoming the write end of a channel.

I think there's a good implicit question here which is: how far should we push streaming errors? We could say "we only really care about RTBuilder/CTBuilder", and as long as they present a nice API we don't mind too much if other parts of grmtools have a slightly cruder error/warnings API. This would probably be a good intermediate step, full stop, and perhaps might be the most pragmatic option anyway?

At the origin where most errors occur I don't believe we have anything such as the yacc filename.y, so we need a way to tack this on in the builders, or otherwise pass it (or even just any form FileId ).

Right, so the streaming error API would probably need some sort of helper methods (or sometimes maybe parameters) that allow it to be told such information. I think some implementations will largely ignore this information, but some will want every last bit of it!

add_duplicate_occurrence is in tension with streaming, because it is collating duplicate rules/names etc which require caching them and reporting them at the end to squash all duplicates to a single error.
https://github.com/softdevteam/grmtools/blob/0f30331f14bbb0d353d533f16e8b2bebeb485a95/cfgrammar/src/lib/yacc/parser.rs#L272-L291C2

I think this an instance of the first problem in disguise? Maybe I'm not noticing an important detail though!

Thanks for bearing with me on this. I know it's challenging when good work receives a barrage of questions! But your PRs really have made me think and I'm very grateful that you're willing to bear with me while we work out what to do next!

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 21, 2023

add_duplicate_occurrence is in tension with streaming, because it is collating duplicate rules/names etc which require caching them and reporting them at the end to squash all duplicates to a single error.
https://github.com/softdevteam/grmtools/blob/0f30331f14bbb0d353d533f16e8b2bebeb485a95/cfgrammar/src/lib/yacc/parser.rs#L272-L291C2

I think this an instance of the first problem in disguise? Maybe I'm not noticing an important detail though!

It's definitely dependent on the first problem, I think the important question here is what do we prioritize between reducing error spam, and streaming errors upon discovery.

  • It is the only place we do lookups in the vector of errors.
  • We currently prioritize an incompatible property, reducing error spam by combining repeated errors and emitting them once with multiple locations.

In this case the errors aren't created in a single pass e.g. given the following:

A: 'a';
A: 'a2';
B: 'b';
A: 'a3';

we errs.push(duplicate A), parse B successfully, then search through errs and add the span for the a3 rule to the error previously pushed, so it isn't like we do a single pass here over all the duplicate names and emit an error.

There are a couple of ways we could solve it, depending upon which which decide has priority.

Thanks for bearing with me on this. I know it's challenging when good work receives a barrage of questions! But your PRs really have made me think and I'm very grateful that you're willing to bear with me while we work out what to do next!

If it helps, I actually think we aren't too far off from e.g. a fully streaming errors code-wise as the crow flies, but to me the important question is whether it is reasonable from an API compatibility perspective, but I think it is worth making a good faith effort.

@ltratt
Copy link
Member

ltratt commented Dec 21, 2023

It is the only place we do lookups in the vector of errors.

We currently prioritize an incompatible property, reducing error spam by combining repeated errors and emitting them once with multiple locations.

Good point! I think I'm OK with shoving off to the implementer of the error trait (we could, if we wanted, even provide a de-duper composition trait, but that's a strictly optional extra). That would simplify what we do internally sufficiently to unblock this general idea I think?

If it helps, I actually think we aren't too far off from e.g. a fully streaming errors code-wise as the crow flies, but to me the important question is whether it is reasonable from an API compatibility perspective, but I think it is worth making a good faith effort.

Thank you -- and I agree!

@ratmice
Copy link
Collaborator Author

ratmice commented Dec 23, 2023

So, today I focused on whether or not it is possible to implement this in any roughly compatible way, and it seems like it is possible I changed one (rarely used method) from returning a slice to a Vec) . The basic idea is to mirror the current ownership structure. ASTWithValidationInfo, goes from owning a Vec<YaccGrammarError> to owning a (error_sender, Option<error_receiver>) pair. We can then reimplement the functions expecting to return a Vec<YaccGrammarError> result in terms of an interator over the error_receiver endpoint. Alternately there is a new method which returns the error_receiver (basically) Option::take, in which case they can receive errors as they are produced.

I've now pushed this to https://github.com/ratmice/grmtools/tree/error_channels
It still doesn't pass the testsuite (it now appears only tests for duplicate errors are left to fix). Because it would either allow us to go through a deprecation cycle, or even just keep the buffered error vecs, around for cases where we want the simpler API?

I think there are a few rough edges, revolving around mixing the channel approach with Vec result approach, e.g. when the user has taken the error_receiver, then subsequently asks for you to convert it into a Vec<YaccGrammarError>, I don't believe we currently have the right return values to indicate this nicely. Anyway, curious about your thoughts on the matter.

@ratmice ratmice mentioned this pull request Dec 23, 2023
@ltratt
Copy link
Member

ltratt commented Dec 28, 2023

I think this PR might also be in a state to be closed?

@ratmice ratmice closed this Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants