Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional Rewrite (2.0.0) #198

Open
wants to merge 134 commits into
base: master
Choose a base branch
from
Open

Functional Rewrite (2.0.0) #198

wants to merge 134 commits into from

Conversation

shreyasminocha
Copy link
Member

@shreyasminocha shreyasminocha commented Sep 23, 2019

See also #196 and #197.

For a sneak peak, check out the tests in the test directory, especially those in test/examples.

Completed features

  • anyCharacterFrom[…]
  • anyCharacterBut[^…]
  • group(…)
    • group.capturing(…)
    • group.nonCapturing(?:…)
    • group.named | group.capturing.named(?<foo>…)
  • backReference\1 or \k<foo>
  • or — (?…|…|…)
  • maybe | optionally…?
    • maybe.greedy…?
    • maybe.lazy…??
  • multiple | zeroOrMore…*
    • multiple.greedy…*
    • multiple.lazy…*?
  • oneOrMore…+
    • oneOrMore.greedy…+
    • oneOrMore.lazy…+?
  • repeat
    • repeat(x)…{x}
    • repeat(min, Infinity)…{x,}
    • repeat(min, max)…{min,max}
    • corresponding greedy alias and lazy variations similar to oneOrMore
  • lookahead(?=…)
    • lookahead.negative(?!…)
    • lookahead.positive(?=…)
  • lookbehind(?<=…)
    • lookbehind.negative(?<!…)
    • lookbehind.positive(?<=…)
  • anyCharacter.
  • digit\d
  • nonDigit\D
  • whitespaceCharacter\s
  • nonWhitespaceCharacter\S
  • wordCharacter\w
  • nonWordCharacter\W
  • something.+
  • anything.*
  • startOfLine^
  • endOfLine$
  • wordBoundary\b
  • nonWordBoundary\B
  • concat
  • Flags

Incomplete/planned features

  • Unicode code point escapes
  • Unicode property escapes
  • String replacement helper constants and functions Dropped. See aeb11c9.
  • … you tell me :)

Tests

They're all up to date from my end and we have 100% coverage although let me know if you notice something that's not being properly tested. At the moment there are over 15 test suites and over 150 tests.

Docs

I haven't written any docs at the moment, although if you want to get a feel of how things will be, check out the tests in the test directory, especially test/examples. I'm thinking of a gatsby site with the actual docs written in mdx although I'm open to other ideas.

This PR should also resolve


I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.

So now the README.md describes a simple example for using VerbalExpressions as such:

const tester = VerEx()
    .startOfLine()
    .then('http')
    .maybe('s')
    .then('://')
    .maybe('www.')
    .anythingBut(' ')
    .endOfLine();

This can be described as a builder-like extension for the native RegExp object; you can > chain the expression and add more stuff to "build" a complete regular expression.

This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the > expression quickly grow out of maintainability and readability.

For example, I think something like this is impossible to implement with VerbalExpressions at the moment:

/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/

To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:

VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
)

Motivation for this approach would be:

  • We can split regular expressions into multiple variables
  • Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
  • Each small part is testable with unit tests
  • Makes grouping explicit (enforce closing an opened capture group)

So the simplest example could be something like this:

const regex = VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
);

And the complex example could be written e.g. like this:

VerEx(
  startOfLine,
  group(
    or(
      concat("http", maybe("s"), "://", maybe("www.")),
      "ftp://",
      "smtp://"
    )
  ),
  group(anythingBut(" /"))
);

While this looks a bit more complex, we can more easily split it up and name things:

const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://");
const removeWww = maybe("www.");
const domain = anythingBut(" /");
const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));

This way we could test all of those "sub-expressions" (variables) in isolation.

@jehna in #196

jehna and others added 30 commits July 19, 2019 21:41
The new version will be written directly with Typescript, so we'll start
by having a simple, blank Typescript project base.
Extracted "simplifyExpression" helper function to its own file
Extracted `simpleExp` function so we can easily compile expressions that
only add something to the input
Changed all functions to return regular expressions. This way we use
regexp as the sanitized form and strings are always unsanitized
`multiple` now supports:

- `multiple('foo')` — `(?:foo)*`
- `multiple('foo', 2)` — `(?:foo){2}`
- `multiple('foo', 2, Infinity)` — `(?:foo){2,}`
- `multiple('foo', 2, 10)` — `(?:foo){2,10}`
We aren't using nyc at the moment.

Entries like .idea, .vscode, and .DS_STORE should be in one's global
gitignore rather than in the project's gitignore.
We aren't using them at the moment.
… in preparation for a readme-rewrite.
@shreyasminocha
Copy link
Member Author

@jehna I'd appreciate if you would leave this PR a review.

Like I said, we're still not ready for production, of course. In terms of features and tests though I think we're pretty solid.

I understand it's a large PR, so take your time!

@shreyasminocha shreyasminocha changed the title WIP: 2.0.0 Functional Rewrite (2.0.0) Feb 9, 2020
@shreyasminocha shreyasminocha linked an issue Feb 10, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Functional rewrite
2 participants