Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation for Java8 and Java9 unimplemented features. #42

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cons-dev
Copy link

This is basically some documentation (and a bit of code to obtain the lists) I wrote. There's no alteration to the other code, so I didn't touch the changelog.

@plexus
Copy link
Member

plexus commented Dec 21, 2021

Hey, thanks! This seems really useful, and I like the direction you are going with this, but we need to be more fine-grained here. If I am reading this right then you are starting from a Java regex, round-tripping it, then running it through the generators, and then checking the values against the regex. This could fail for a number of reasons, so it's too simplistic to conclude that something is "not implemented".

For example, we definitely do support vertical whitespace

(re/pattern :vertical-whitespace)
;;=> "\\v"

(re-parse/parse #"\v")
;;=> :vertical-whitespace

(re-gen/sample :vertical-whitespace 10)
;;=> ("\r" "�" "�" "�" "
" "\r" "\n" "
" "\f" "
")

But maybe the generator isn't accurate, or something else is up. If so then that would be great to find out so we can improve it. The biggest challenge with this library is getting all the different implementations to line up, which is also why we are currently not adding any new regex features, because we are still in the stage of solidifying the features we have. So any information about (lack of) parity between generating regex, parsing regex, and generating generators would be really appreciated.

Did you find the test_cases.edn? Seems like you have a bunch more cases that could be added there, that would help a lot, since we run a bunch of different tests off of these to get fine-grained reporting on what's not working.

Some things are deliberate, we don't "support" POSIX character classes because they are really just shorthands or aliases, so you can get the same result by being explicit, and that way it's easier to get things to work cross-platform. Generally the starting point for us is always the Regal syntax. Any valid regal expression should work the same way across implementations and target platforms.

Cross-platform support is a really important goal for this library, so we don't care much about the specifics of Java's regex engine, we look at features that are common across regex implementations, and figure out how to support them on all the different targets. Supporting every feature of a specific platform is not a goal.

That all said I think this mostly can go in, it's useful information, but it would be great if you could go over the list and have a closer look at the ones that are nominally supported, like :vertical-whitespace, so we can find out what's up and maybe improve the library.

  • Lookahead/lookbehind is supported, but not for generators

Should this be reported as a bug/issue, or is this not something we can realistically do in the generators?

Thanks!

@alysbrooks
Copy link
Member

Hi @cons-dev, there's not a rush on this, but do you think you'll be able to work on this again soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants