Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating the C and CN parsing #342

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

kmemarian
Copy link
Collaborator

@kmemarian kmemarian commented Jun 22, 2024

This creates a separate lexer and parser for CN.

There are a few questions for @cp526 and @dc-mak marked with TODO(Christopher/Dhruv) in the code.

And there is a hack for dealing with the parsing of C type-names discussed in the commit message of c96a1ad.

This is still work-in-progress, the CN parser doesn't produce pretty messages so some CI and tutorial tests fail.

I'm keeping `c_parser.mly` and `c_lexer.mll` unchanged for now to avoid destroying
`c_parser_error.messages` and to keep CN in a working state.
This does not duplicate the C grammar, instead when the CN lexer
sees matching `<` ... `>` it attempts to parse a C type-name between
the angles (using a new entry point to the C parser) and produces
a single `LT_CTYPE_GT` token for the CN parser.

This may fail because the opening `<` may in fact be a relational
operator. In this case the CN lexer recovers by re-lexing from
the top.

This first attempt avoids changing the existing CN syntax. If we
instead change it to have dedicated delimiters around C type-names,
we won't need to do the re-lexing recovery.

Most of the CN ci tests now work (the failures are caused by the current
lack of pretty error messages in the CN parser), but this needs proper
testing...
@cp526
Copy link
Collaborator

cp526 commented Jun 24, 2024

Thanks very much!

AFAICS CN_bool and cn_function can go away. Not clear to me at the moment whether boolean might have a use if there's a clash of names between C (integer) booleans and CN (proper) booleans.

@cp526
Copy link
Collaborator

cp526 commented Jun 24, 2024

Regarding the question of parsing C-types within CN: maybe the lexer hack is less problematic than it seems: there's a small set of language constructs only that can take C-types (Owned, array_shift, and maybe nothing else?), and those all require parentheses following the <ctype> part for the remaining arguments. I wonder whether that makes the syntax unambiguous wrt the C-type angle-bracket notation so there's never a need to re-lex?

Copy link
Contributor

@dc-mak dc-mak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, though this ctype issue is gonna be annoying.

let hexadecimal_constant = hexadecimal_prefix hexadecimal_digit+

(* C23 binary constant, but omitting ' separators for now *)
(* TODO(Christopher/Dhruv): add support for ' separators to be in line with C23 ? *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! #337

Comment on lines +120 to +122
(* TODO(Christopher/Dhruv): do you care about following C11 closely for this
(e.g. with respect to the universal character stuff), or should we simplify
like we already do for uppercase names. *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote simplify - we can always add it later if requested.

Comment on lines +168 to +177
| '<' ([^'='][^'<' '>']+ as str) '>'
{
try
LT_CTYPE_GT (C_parser.type_name_eof C_lexer.lexer (Lexing.from_string str))
with
| _ ->
relexbuf_opt := Some (Lexing.from_string (str ^ ">"));
LT
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazing hack. But I do worry it might be a bit too hacky. Can it be more conservative? Specifically, I wanted to have <- available as a token. We do need to discuss this properly though @cp526

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of ...<...> was to have a C++-like syntax for type parameters, which would be good to keep if we can. What's the use of <-?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think types always start with a letter (or maybe _), so you could probably change it to only try for a type, if the < is followed by one of these. I think should allow you to have <- later if you wanted it.

Otoh, I don't think this would work if you ever wanted types that have instantiations in them, but we probably don't have those at the moment.

Comment on lines +16 to +17
"boolean", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *)
"CN_bool", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two can go.

@dc-mak
Copy link
Contributor

dc-mak commented Jun 24, 2024

Thanks very much!

AFAICS CN_bool and cn_function can go away. Not clear to me at the moment whether boolean might have a use if there's a clash of names between C (integer) booleans and CN (proper) booleans.

We're still using cn_function for constants! https://rems-project.github.io/cn-tutorial/#_constants

@cp526
Copy link
Collaborator

cp526 commented Jun 24, 2024

Thanks very much!
AFAICS CN_bool and cn_function can go away. Not clear to me at the moment whether boolean might have a use if there's a clash of names between C (integer) booleans and CN (proper) booleans.

We're still using cn_function for constants! https://rems-project.github.io/cn-tutorial/#_constants

Yes, that's true.

"bool", BOOL; (* shared with C23 *)
"boolean", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *)
"CN_bool", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *)
"cn_function", FUNCTION; (* TODO(Christopher/Dhruv): is this variant still needed? *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stays.

@yav yav mentioned this pull request Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants