Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handler trait #428

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 37 additions & 2 deletions lrlex/src/lib/ctbuilder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ use regex::Regex;
use serde::Serialize;

use crate::{
DefaultLexerTypes, LRNonStreamingLexerDef, LexerDef, RegexOptions, DEFAULT_REGEX_OPTIONS,
DefaultLexerTypes, LRNonStreamingLexerDef, LexBuildError, LexerDef, RegexOptions,
DEFAULT_REGEX_OPTIONS,
};

const RUST_FILE_EXT: &str = "rs";
Expand Down Expand Up @@ -79,6 +80,26 @@ pub enum RustEdition {
Rust2021,
}

pub trait LexErrorHandler<LexerTypesT>
where
LexerTypesT: LexerTypes,
usize: num_traits::AsPrimitive<LexerTypesT::StorageT>,
{
/// Called with the lexers filename
fn lexer_path(&mut self, filename: &Path);
/// Called with the lexers source contents.
fn lexer_src(&mut self, src: &str);
Copy link
Collaborator Author

@ratmice ratmice Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that just came to mind is we could consider adding here is fn generated_src(&mut self, src:&str)?

That might be useful for nimbleparse_lsp, so it could try to pass off generated src to rust-analyzer to typecheck it. (I'm not entirely sure how to do that at the moment though)

fn on_lex_build_error(&mut self, errors: Box<[LexBuildError]>);
ratmice marked this conversation as resolved.
Show resolved Hide resolved
fn missing_in_lexer(&mut self, missing: &HashSet<String>);
fn missing_in_parser(&mut self, missing: &HashSet<String>);
Copy link
Collaborator Author

@ratmice ratmice Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth thinking about whether we can do better than just HashSet<String> for missing_in_*.

I.e. for missing_in_lexer, we can perhaps give a PIdx to the parsers production referencing the token.
It would probably need to be given a YaccGrammar, and a YaccGrammarAST to go from there to spans.
I think this is okay to lrlex, because it depends on lrpar, and overall is where this check occurs currently.

Edit: Not sure the above is currently actionable with the current signature of set_rule_ids though.

And for missing_in_parser, we have LRNonStreamingLexerDef::get_rule_by_name, but we aren't passing in a LRNonStreamingLexerDef here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth thinking about whether we can do better than just HashSet for missing_in_*.

Yes I tend to agree.

Not sure the above is currently actionable with the current signature of set_rule_ids though.

I also think I agree with that. I don't immediately have a good idea as to whether we can come up with a better API for set_rule_ids or not, but I don't object to changing it, because it's not a function that we expect many people to use directly.

/// This function must return an `Err` variant if any of the following are true:
///
/// - missing_in_lexer is called.
/// - missing_in_parser is called.
/// - on_lex_build_error is called.
fn results(&self) -> Result<(), Box<dyn Error>>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Result<(), Box<dyn Error>> return value isn't currently very useful,
It's only called in cases it must return an Err, and since the on_* is never called when it is allowed to return Ok
It never has anything to do further analysis on, so it can't currently introduce it's own errors above and beyond the current error checking. Perhaps we could consider adding a fn analyse(...), It could be a method provided by the trait with an empty implementation even, which trait implementations can override.

Finally a call to results() at the end even if nothing lrpar considers an error has occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that I understand what this function does and why it does something distinct in the API. Is it because the other methods are expected to do some sort of mutation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, self is expected to basically collect the previous calls into some type T fully contained within self.
The trait exists to basically hide that T from lrpar.

In the ariadne trait implementation this is a vector of ariadne::Reports, in the simpler test impl, it is a String and the result() function calls into() on it. for Sarif i'm expecting it to be a type that will be serializable to that format.

It is IMO not the prettiest thing, but I was running short on nicer ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was that results might want to take self (rather than &self) to make it clear that it's the "last" thing. But I'm not quite sure that's possible with Rust's type system in this case.

But then I wondered: do we even need such a "final" function? Can't whoever implements this trait summarise things on their own, without this? In other words, if they want to track whether there are any warnings/errors they can have (say) a bool=false and set it to true at the start of each implementation of a method in this trait?

Copy link
Collaborator Author

@ratmice ratmice Dec 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the key to things like Safir output is that self is not consumed by this call.
If you look at the last line in this test, it printlns some stuff out of the ErrorHandler, after calls to build().

https://github.com/ratmice/error_callbacks_test/blob/implement_as_trait/build.rs#L287

We could definitely get away without result, but it means things like build.rs need to have some code which reaches into the type and prints warnings and errors, rather than only needing to do so in special cases like where you want to convert it to a specific format. That was my reasoning for adding results(), and having it take a reference.

Edit: I removed the following from this comment, (I'm not sure whether dry_run should return an error like build() or just collect stuff into an ErrorHandler 🤷
my hope was that we could introduce a call like dry_run, which mirrors build() function but doesn't perform the filesystem operations, from which can pick up the results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing it is doing is delegating the responsibility of turning concrete e.g. YaccGrammarError types, into the abstract Box, the return type of build()

Ah, so I got the wrong end of the stick (as is not uncommon!). So is what you're proposing a sort-of ErrorMap trait in the sense of "we give you, the external implementer an error, you map (in the sense of iter().map(|| ...) it into whatever you want, then we bundle all of those mapped things up and return them"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is a good way of putting it. I lean towards that being the extent of what it should be doing since trying to do more seems just seems to make things complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, got it!

So what I'll do is explain what I thought you were proposing, then I'll go away and chew over which I think might be best!

What I thought was going to happen is that we provide this trait and as we find a warning/error we call the appropriate function. The user can then do various things: they can, for example, print things out, collate them to do something with later, or whatever. But as soon as we've called that function, our responsibility for that particular warning/error is over (i.e. we don't collate them).

So, for example, CTBuilder might choose to collate everything, and then print it out at when build is called. In contrast, an editor might try displaying popups immediately to a user while analysis (etc.) continues in a background thread.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally it was kind of trying to do both things, e.g. it has to sort of provide the map because build() has to return something, in theory it could just return a token indicating ErrorsOccurred, and beyond that expect the trait to do what it does. But also it has to dectect cargo and print things using a cargo in-band cargo:warning. So if we were to delegate all that to the trait, it is a bit of build script environment variable testing and special casing that each trait needs to implement when called from build().

The above is what probably kept me from further going down that path.

Copy link
Collaborator Author

@ratmice ratmice Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing about e.g. pop-up alerts, I would note that this is still possible with the map centric approach, but it must act in batches, and require the outside to implement the pop-up behavior.

struct MyErrors {
   errors: Vec<YaccGrammarError>,
   warnings: Vec<YaccGrammarWarning>,
}

impl ErrorMapper for MyErrors {
     fn on_grammar_error(&mut self, error: YaccGrammarError) -> Box<dyn Error> {
         self.errors.push(error);
         // this perhaps would need to Box::new()?
         error.into() 
     }
}

then from build.rs:

let _ = ct.error_mapper(myerrors).build();
for e in my_errors {
    popup_alert(e);
}

Something like that anyways.

}

/// A `CTLexerBuilder` allows one to specify the criteria for building a statically generated
/// lexer.
pub struct CTLexerBuilder<'a, LexerTypesT: LexerTypes = DefaultLexerTypes<u32>>
Expand All @@ -97,6 +118,7 @@ where
allow_missing_terms_in_lexer: bool,
allow_missing_tokens_in_parser: bool,
regex_options: RegexOptions,
error_handler: Option<&'a mut dyn LexErrorHandler<LexerTypesT>>,
}

impl<'a> CTLexerBuilder<'a, DefaultLexerTypes<u32>> {
Expand Down Expand Up @@ -141,9 +163,18 @@ where
allow_missing_terms_in_lexer: false,
allow_missing_tokens_in_parser: true,
regex_options: DEFAULT_REGEX_OPTIONS,
error_handler: None,
}
}

pub fn error_handler(
mut self,
error_handler: &'a mut dyn LexErrorHandler<LexerTypesT>,
) -> Self {
self.error_handler = Some(error_handler);
self
}

/// An optional convenience function to make it easier to create an (lrlex) lexer and (lrpar)
/// parser in one shot. The closure passed to this function will be called during
/// [CTLexerBuilder::build]: it will be passed an lrpar `CTParserBuilder` instance upon which
Expand Down Expand Up @@ -327,14 +358,18 @@ where
}

let lex_src = read_to_string(lexerp)?;
let line_cache = NewlineCache::from_str(&lex_src).unwrap();
if let Some(error_handler) = self.error_handler.as_mut() {
error_handler.lexer_path(lexerp.as_path());
error_handler.lexer_src(lex_src.as_str());
}
let mut lexerdef: Box<dyn LexerDef<LexerTypesT>> = match self.lexerkind {
LexerKind::LRNonStreamingLexer => Box::new(
LRNonStreamingLexerDef::<LexerTypesT>::new_with_options(
&lex_src,
self.regex_options.clone(),
)
.map_err(|errs| {
let line_cache = NewlineCache::from_str(&lex_src).unwrap();
errs.iter()
.map(|e| {
if let Some((line, column)) = line_cache.byte_to_line_num_and_col_num(
Expand Down
4 changes: 3 additions & 1 deletion lrlex/src/lib/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ mod lexer;
mod parser;

pub use crate::{
ctbuilder::{ct_token_map, CTLexer, CTLexerBuilder, LexerKind, RustEdition, Visibility},
ctbuilder::{
ct_token_map, CTLexer, CTLexerBuilder, LexErrorHandler, LexerKind, RustEdition, Visibility,
},
defaults::{DefaultLexeme, DefaultLexerTypes},
lexer::{
LRNonStreamingLexer, LRNonStreamingLexerDef, LexerDef, RegexOptions, Rule,
Expand Down