Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(air-parser): canon stream syntax #618

Merged
merged 2 commits into from
Jul 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 52 additions & 10 deletions crates/air-lib/air-parser/src/parser/lexer/call_variable_parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,12 @@ pub(super) fn try_parse_call_variable(
CallVariableParser::try_parse(string_to_parse, start_pos)
}

#[derive(Debug)]
#[derive(Debug, Clone, Copy)]
enum MetTag {
None,
Stream,
StreamMap,
Canon,
CanonStream,
}

Expand Down Expand Up @@ -175,7 +176,10 @@ impl<'input> CallVariableParser<'input> {
}

fn try_parse_as_variable(&mut self) -> LexerResult<()> {
if self.try_parse_as_stream_start()? || self.try_parse_as_json_path_start()? {
if self.try_parse_as_canon()?
|| self.try_parse_as_stream()?
|| self.try_parse_as_json_path_start()?
{
return Ok(());
} else if self.is_json_path_started() {
self.try_parse_as_json_path()?;
Expand All @@ -186,15 +190,31 @@ impl<'input> CallVariableParser<'input> {
Ok(())
}

fn try_parse_as_stream_start(&mut self) -> LexerResult<bool> {
let stream_tag = MetTag::from_tag(self.current_char());
if self.current_offset() == 0 && stream_tag.is_tag() {
fn try_parse_as_stream(&mut self) -> LexerResult<bool> {
let tag = MetTag::from_tag(self.current_char());
if self.current_offset() == 0 && tag.is_tag() {
if self.string_to_parse.len() == 1 {
let error_pos = self.pos_in_string_to_parse();
return Err(LexerError::empty_stream_name(error_pos..error_pos));
return Err(LexerError::empty_tagged_name(error_pos..error_pos));
}

self.state.met_tag = stream_tag;
self.state.met_tag = tag;
return Ok(true);
}

Ok(false)
}

fn try_parse_as_canon(&mut self) -> LexerResult<bool> {
let tag = self.state.met_tag.deduce_tag(self.current_char());

if self.current_offset() == 1 && tag.is_canon_stream() {
if self.string_to_parse.len() == 2 && tag.is_tag() {
let error_pos = self.pos_in_string_to_parse();
return Err(LexerError::empty_canon_name(error_pos..error_pos));
Comment on lines +211 to +214
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it could be expressed in a bit different way, please don't merge the PR, I'll take a look later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to rename try_parse_as_tagged_token to try_parse_as_stream as was discussed.

}

self.state.met_tag = tag;
return Ok(true);
}

Expand Down Expand Up @@ -238,6 +258,9 @@ impl<'input> CallVariableParser<'input> {
return Err(LexerError::leading_dot(
self.start_pos..self.pos_in_string_to_parse(),
));
} else if self.state.met_tag.is_tag() && self.current_offset() <= 2 {
let prev_pos = self.pos_in_string_to_parse() - 1;
return Err(LexerError::empty_canon_name(prev_pos..prev_pos));
}
self.state.first_dot_met_pos = Some(self.current_offset());
return Ok(true);
Expand Down Expand Up @@ -288,7 +311,7 @@ impl<'input> CallVariableParser<'input> {
name,
position: self.start_pos,
},
MetTag::CanonStream => Token::CanonStream {
MetTag::CanonStream | MetTag::Canon => Token::CanonStream {
name,
position: self.start_pos,
},
Expand All @@ -311,7 +334,7 @@ impl<'input> CallVariableParser<'input> {
lambda,
position: self.start_pos,
},
MetTag::CanonStream => Token::CanonStreamWithLambda {
MetTag::CanonStream | MetTag::Canon => Token::CanonStreamWithLambda {
name,
lambda,
position: self.start_pos,
Expand Down Expand Up @@ -383,16 +406,35 @@ impl<'input> CallVariableParser<'input> {
}
}

/// There are two kinds of tags ATM, namely tag and canon tag.
/// Tag defines the first level and comes first in a variable name, e.g. $stream.
/// Canon tag is the only tag that ATM defines the second level.
/// Canon tag comes second in a variable name, e.g. #$canon_stream.
impl MetTag {
fn from_tag(tag: char) -> Self {
match tag {
'$' => Self::Stream,
'#' => Self::CanonStream,
'#' => Self::Canon,
'%' => Self::StreamMap,
_ => Self::None,
}
}

fn deduce_tag(&self, tag: char) -> Self {
match tag {
'$' if self.is_canon() => Self::CanonStream,
_ => self.to_owned(),
}
}

fn is_canon(&self) -> bool {
matches!(self, Self::Canon)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you derive PartialEq on MetTag, you may simply write *self == Self::Canon.

It worth deriving Eq too, as it OK for this type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a benefit comparing with the current code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple equality seems to be more straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to stick to whatever is used now and see no real benefit changing this to equality b/c it is internally the same.

Copy link
Contributor

@monoid monoid Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style and idioms are not about internal changes at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it is unwise to use Eq here given that is_tag() uses !matches and I am not changing is_tag b/c it is outside the scope of this patch. You suggest to break the style from my point of view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a PR for small style/idiomatic changes is OK, otherwise they will never be changed.

matches! should be used for types that cannot be compared for some reason ("I haven't defined PartialEq just because" is not the reason). This enum is essentially simple C-style enum.

}

fn is_canon_stream(&self) -> bool {
matches!(self, Self::CanonStream)
}

fn is_tag(&self) -> bool {
!matches!(self, Self::None)
}
Expand Down
18 changes: 13 additions & 5 deletions crates/air-lib/air-parser/src/parser/lexer/errors.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,11 @@ pub enum LexerError {
#[error("only alphanumeric, '_', and '-' characters are allowed in this position")]
IsNotAlphanumeric(Span),

#[error("a stream name should be non empty")]
EmptyStreamName(Span),
#[error("a tagged name should be non empty")]
EmptyTaggedName(Span),

#[error("a canon name should be non empty")]
EmptyCanonName(Span),

#[error("this variable or constant shouldn't have empty name")]
EmptyVariableOrConst(Span),
Expand Down Expand Up @@ -75,7 +78,8 @@ impl LexerError {
Self::UnclosedQuote(span) => span,
Self::EmptyString(span) => span,
Self::IsNotAlphanumeric(span) => span,
Self::EmptyStreamName(span) => span,
Self::EmptyTaggedName(span) => span,
Self::EmptyCanonName(span) => span,
Self::EmptyVariableOrConst(span) => span,
Self::InvalidLambda(span) => span,
Self::UnallowedCharInNumber(span) => span,
Expand All @@ -102,8 +106,12 @@ impl LexerError {
Self::IsNotAlphanumeric(range.into())
}

pub fn empty_stream_name(range: Range<AirPos>) -> Self {
Self::EmptyStreamName(range.into())
pub fn empty_tagged_name(range: Range<AirPos>) -> Self {
Self::EmptyTaggedName(range.into())
}

pub fn empty_canon_name(range: Range<AirPos>) -> Self {
Self::EmptyCanonName(range.into())
}

pub fn empty_variable_or_const(range: Range<AirPos>) -> Self {
Expand Down
73 changes: 51 additions & 22 deletions crates/air-lib/air-parser/src/parser/lexer/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,37 +215,66 @@ fn stream_map() {

#[test]
fn canon_stream() {
const CANON_STREAM: &str = "#stream____asdasd";
for canon_stream_name in vec!["#stream____asdasd", "#$stream____asdasd"] {
lexer_test(
canon_stream_name,
Single(Ok((
0.into(),
Token::CanonStream {
name: canon_stream_name,
position: 0.into(),
},
canon_stream_name.len().into(),
))),
);
}

let cannon_stream_name = "#s$stream____asdasd";
lexer_test(
CANON_STREAM,
Single(Ok((
0.into(),
Token::CanonStream {
name: CANON_STREAM,
position: 0.into(),
},
CANON_STREAM.len().into(),
))),
cannon_stream_name,
Single(Err(LexerError::is_not_alphanumeric(2.into()..2.into()))),
);

let cannon_stream_name = "#";
lexer_test(
cannon_stream_name,
Single(Err(LexerError::empty_tagged_name(0.into()..0.into()))),
);
}

#[test]
fn canon_stream_with_functor() {
let canon_stream_name = "#canon_stream";
let canon_stream_with_functor: String = format!("{canon_stream_name}.length");
for canon_stream_name in vec!["#canon_stream", "#$canon_stream"] {
let canon_stream_with_functor: String = format!("{canon_stream_name}.length");

lexer_test(
&canon_stream_with_functor,
Single(Ok((
0.into(),
Token::CanonStreamWithLambda {
name: canon_stream_name,
lambda: LambdaAST::Functor(Functor::Length),
position: 0.into(),
},
canon_stream_with_functor.len().into(),
))),
);
}

let cannon_stream_name = "#s$stream____asdasd.length";
lexer_test(
&canon_stream_with_functor,
Single(Ok((
0.into(),
Token::CanonStreamWithLambda {
name: canon_stream_name,
lambda: LambdaAST::Functor(Functor::Length),
position: 0.into(),
},
canon_stream_with_functor.len().into(),
))),
cannon_stream_name,
Single(Err(LexerError::is_not_alphanumeric(2.into()..2.into()))),
);
let cannon_stream_name = "#.length";
lexer_test(
cannon_stream_name,
Single(Err(LexerError::empty_canon_name(0.into()..0.into()))),
);
let cannon_stream_name = "#$.length";
lexer_test(
cannon_stream_name,
Single(Err(LexerError::empty_canon_name(1.into()..1.into()))),
);
}

Expand Down
Loading