Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parsing Unicode names #9

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

emilyyyylime
Copy link
Collaborator

This is locked behind an optional build dependency to avoid incurring the cost of unicode_names2 to every downstream package. (It really is much heavier than it needs to be 😅)

To reiterate; this PR does not amount to any extra build time or dependencies for downstream packages (unless they explicitly opt in to the unicode_names2 feature.)

The current syntax consist of simply appending the name after the U+XXXX literal, some alternatives that were discussed:

  1. zwj U+200D "Zero width joiner"
  2. zwj U+200D // Zero width joiner
    (possibly also using a specialised comment token like /// or //!)

Also as the code is implemented currently, explicit names cannot be added after verbatim characters. This was mostly an arbitrary choice and might be rethought if we decide on implementing one of the other syntax options above.

Another area with some room to behave differently is this being a build dependency, rather than a dev-dependency. This was done because writing the verification as a separate test would require reimplementation of the parsing code from the ground up (dev-dependencies are only available inside tests/benchmarking code).

Example output for bad names:

image

@mkorje mkorje requested a review from laurmaedje November 20, 2024 02:25
build.rs Outdated Show resolved Hide resolved
@MDLC01 MDLC01 added the meta Discussion about the structure of this repo label Nov 20, 2024
@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 23, 2024

This would close #6, right?

@emilyyyylime
Copy link
Collaborator Author

Yeah, that's the idea

@emilyyyylime emilyyyylime linked an issue Nov 23, 2024 that may be closed by this pull request
@laurmaedje
Copy link
Member

I think it might be a little cleaner to have a new feature validate-unicode-escapes = ["dep:unicode_names2"] and use that externally.

@MDLC01
Copy link
Collaborator

MDLC01 commented Nov 29, 2024

I think it may be more future-proof to force the name to be quoted (alternative A. in OP), as suggested in #2 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta Discussion about the structure of this repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: include Unicode codepoint standard names to sym.txt
3 participants