Implement take_until_parser_matches #469

tomalexander · 2017-03-20T01:58:43Z

Implement a macro that will consume input until the child parser matches

tomalexander · 2017-03-20T01:59:46Z

Hey, sorry that took so long. Ran into an old friend I hadn't talked to in over half a decade and I've been recovering from that hangover.

coveralls · 2017-03-20T02:13:40Z

Coverage increased (+0.1%) to 82.845% when pulling 803e17c on tomalexander:take_until_parser_matches into c5090ef on Geal:master.

coveralls · 2017-03-20T02:13:40Z

Coverage increased (+0.02%) to 87.96% when pulling f387919 on tomalexander:take_until_parser_matches into db94701 on Geal:master.

chengsun · 2017-03-23T14:33:23Z

This looks very useful, but a bit broken as implemented. Specifically, if the child parser never matches, take_until_parser_matches will not return an Error or Incomplete. For example:

named!(end <&[u8],&[u8]>, tag!("bbb"));
named!(before_bbb, take_until_parser_matches!(end));

assert_eq!(before_bbb(&b"ab"[..]), Incomplete(Needed::Size(3)));
// assert fails: actual value is Done(&b"b"[..], &b"a"[..])

Also, maybe we could also have a take_until_parser_matches_and_consume variant?

tomalexander · 2017-03-23T16:45:35Z

Thanks for taking a look! That behavior was my intent, but I just checked and you're right. The other take_until* macros throw an error if theres no match. I'll revise (and add more tests for that)

Geal · 2017-07-17T10:26:31Z

any news on this?

tomalexander · 2017-07-17T13:47:02Z

Ah sorry. Since posting this merge request I haven't found time to update it (changed jobs, sister's wedding, etc). FWIW this is a feature I still want but I can't commit to any particular timeframe so would you prefer I close the merge request and open a new one later or keep this one lingering?

…

On Jul 17, 2017 3:26 AM, "Geoffroy Couprie" ***@***.***> wrote: any news on this? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#469 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAb-IGx8m4wxG6uHTE4dDmMe44MDP0z9ks5sOzbYgaJpZM4Mh9wo> .

tomalexander · 2020-03-08T03:45:30Z

Hey, back from the dead. I'm loving the switch to functions in nom version 5! I've updated the PR the latest master, rewrote it as a function instead of a macro, and fixed the style issue chengsun raised about it not returning an error if the parser does not find anything to match.

I haven't implemented take_until_parser_matches_and_consume because I think that would be the same as just a simple terminated(take_until_parser_matches(foo), foo) but if you'd still like it in, I can add it.

tomalexander · 2020-03-14T20:27:02Z

I thought about it some more and I decided to add in a take_until_parser_matches_and_consume because I realized its a performance thing. As in, terminated(take_until_parser_matches(foo), foo) still has to run the parser foo one additional time, even though we just successfully ran it in take_until_parser_matches. I'm still using the same ErrorKind though, should I be making a distinct one for take_until_parser_matches_and_consume ?

tomalexander · 2020-04-12T20:43:28Z

Hey, I just made a small design change, so I want to call it out here to make sure that the project is aware of it and agree with it. Namely, I changed it to attempt to use its inner parser on the empty string after the full input has been captured.

Previously, take_until_parser_matches would apply its inner parser on all substrings until the end of the input, but not including the end of the input. So for example, if the input was "foo" it would test f("foo"), f("oo") and f("o") but not f(""). This meant that it could never capture the full input.

An example use-case where this change is helpful is trying to read until the end of the input, excluding any trailing whitespace. You could write this parser as:

assert_eq!(
    Ok(("\n", "foo bar baz")),
    take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))(
        "foo bar baz\n"
    )
);

but if there was no trailing whitespace, even though multispace0 accepts an empty input, this would fail:

assert_eq!(
    Ok(("", "foo bar baz")),
    take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))(
        "foo bar baz"
    )
);

After this patch, both code blocks succeed.

Also I see I need to rebase my branch on the latest changes. I'll get to that now.

tomalexander · 2020-06-21T19:55:52Z

I think its important to point out the difference between this and many_till since the two are very similar parsers.

many_till works by running the first parser and then checking the second parser. Then it runs the first parser again, and then checks the second parser. This repeats until the second parser has a result.

take_until_parser_matches instead iterates forward checking the second parser at each step, returning everything from before the match.

The difference is useful when the first parser is aggressive/insatiable (I want to say 'greedy' but I know in parsing thats the opposite). So, for example, lets say you had a string:
"ababababefg" and you wanted to match as many "ab"s as possible, followed by as many "a"s as possible, as long as its all before a "befg". With many_till you might try to write this as:

let x: IResult<&str, (Vec<&str>, &str)> =
            many_till(alt((tag("ab"), tag("a"))), tag("befg"))("ababababefg");
        assert_eq!(x, Ok(("", (vec!["ab", "ab", "ab", "a"], "befg"))));

But that won't work because the first parser will eat the "b" in "befg", causing the second parser to never match.

Whereas take_until_parser_matches is only attempting to match the end/2nd parser so you could write:

        let x: IResult<&str, (Vec<&str>, &str)> = tuple((
            map_parser(
                take_until_parser_matches(tag("befg")),
                many1(alt((tag("ab"), tag("a")))),
            ),
            tag("befg"),
        ))("ababababefg");
        assert_eq!(x, Ok(("", (vec!["ab", "ab", "ab", "a"], "befg"))));

and it would work.

These are obviously contrived examples, but I think it would be particularly useful for parsing org-mode headlines. The headlines support "tags" at the end of the line, so you could have a headline like:

*** Handling segfaults :thisIsATag:andSoIsThis:

but if the ending isn't a valid tag (for example, having spaces or not having a closing colon) then it gets treated as part of the headline

*** Handling segfaults :this isnt a tag:
*** Handling segfaults :thisAlsoIsntATag

so with take_until_parser_matches you could do:

let (i, (headline, tags)) = tuple((take_until_parser_matches(alt((tag_parser, line_ending))), opt(tag_parser)))(i)?;

but with many_till I don't really see a solution other than:

many_till(anychar, alt((tag_parser, line_ending)))

leaving you with an unfortunate vector of single characters instead of a nice slice.

NickNick · 2020-12-04T22:09:18Z

Hi! I was taking this for a spin, but got a panic when the 'take_split' tried to index a string between code points:

thread 'test_nom_token_recipe' panicked at 'byte index 10 is not a char boundary; it is inside 'ö' (bytes 9..11) of ` frozen rösti rounds to oven`', /home/nick/.cargo/registry/src/github.com-1ecc6299db9ec823/nom-6.0.1/src/traits.rs:504:7

This happens here:

        for ind in 0..=input_length {
            let (remaining, taken) = i.take_split(ind);
            match f.parse(remaining) {
                Err(_) => (),

I changed it into this (added InputIter, use iter_indices):

    Input: InputTake + InputIter + InputLength + Clone,
    F: Parser<Input, O, Error>,
    Error: ParseError<Input>,
{
    move |input: Input| {
        let i = input;
        for (ind, _) in i.iter_indices() {
            let (remaining, taken) = i.take_split(ind);

Which seems to work for me, but please do check it. :)

Of course, thanks for writing it in the first place! Couldn't have done it myself. 👍

tomalexander · 2020-12-08T20:38:55Z

Hey thanks for the report and the suggested change! I'll try to look at it this weekend (I'm taking vacation days this week to work on a side project so I don't want get sidetracked during the week)

tomalexander · 2020-12-13T21:40:49Z

Thanks! I like the change. The only change I needed to add was for checking the parser after the input has been exhausted.

The take_until_parser_matches parser iterates through the input attempting to apply the inner parser to it. Upon successfully applying the inner parser, it will consume and return the input up until the inner parser's match while not consuming the contents of the inner parser.

Previously, take_until_parser_matches would apply its inner parser on all substrings until the end of the input, but not including the end of the input. So for example, if the input was `"foo"` it would test `f("foo")`, `f("oo")` and `f("o")` but not `f("")`. This meant that it could never capture the full input. An example use-case where this change is helpful is trying to read until the end of the input, excluding any trailing whitespace. You could write this parser as: ```python assert_eq!( Ok(("\n", "foo bar baz")), take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))( "foo bar baz\n" ) ); ``` but if there was no trailing whitespace, even though `multispace0` accepts an empty input, this would fail: ```python assert_eq!( Ok(("", "foo bar baz")), take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))( "foo bar baz" ) ); ``` After this patch, both code blocks succeed.

- Migrated to the Parser<> trait - Changed Fn -> FnMut as required by the Parser trait - separated_list -> separated_list1

@NickNick

…ed by @NickNick on github. This avoids the issue of splitting an index of a string between unicode codepoints.

… behind features.

meritozh · 2021-05-14T03:43:52Z

this is very useful. any process on this PR?

tomalexander · 2021-05-15T20:06:51Z

Thanks! Last I heard from @Geal he was very busy so he didn't have a lot of time to spend on PRs. AFAIK all thats remaining is code review and merging.

Geal · 2021-05-15T20:35:46Z

Yup, still pretty busy, but I intend to do a pass on PRs soon

tomalexander · 2021-05-15T20:53:30Z

Thats wonderful news, thanks!

meritozh · 2021-06-19T17:18:12Z

@Geal hi，this PR not merged into 6.2？Or we delay it to 7.0?

homersimpsons · 2021-08-27T14:10:54Z

@Geal I guess this PR could be closed as it seems to be a duplicate of take_till

meritozh · 2021-08-27T15:18:05Z

@homersimpsons take_till only accept InputTakeAtPosition which mostly is &[u8] or &str, take_until_parser_matches accept Parser.

chotchki · 2021-10-17T02:59:17Z

I just wanted to add that this pull request works great for my use case.

I'm trying to parse Postgres compatible sql constant strings. There is a variant that auto joins separated strings together ONLY if they are separated by a newline. The take_until_parser_matches_and_consume function is perfect for this.

… apply against master.

Stargateur · 2021-10-25T05:59:22Z

See my comment I don't think this is a good addition to nom

tomalexander · 2023-03-14T14:55:11Z

Good enough for me, closing

Geal force-pushed the master branch 3 times, most recently from 6c99077 to e7ca818 Compare May 14, 2018 12:06

Geal force-pushed the master branch from ab71a6b to 49a20d4 Compare October 14, 2018 14:10

Geal force-pushed the master branch 2 times, most recently from fbd7d65 to cfc19f1 Compare May 9, 2019 11:10

tomalexander force-pushed the take_until_parser_matches branch 2 times, most recently from b9a827d to 7d05a21 Compare March 8, 2020 03:37

Geal added this to the 6.0 milestone Apr 6, 2020

tomalexander force-pushed the take_until_parser_matches branch from 0b00048 to 74e1603 Compare April 12, 2020 21:35

tomalexander changed the title ~~Implement the take_until_parser_matches macro~~ Implement the take_until_parser_matches Jul 11, 2020

tomalexander force-pushed the take_until_parser_matches branch from 74e1603 to 9aed9e4 Compare July 12, 2020 00:05

tomalexander changed the title ~~Implement the take_until_parser_matches~~ Implement take_until_parser_matches Jul 12, 2020

tomalexander added 3 commits December 13, 2020 16:49

Add a take_until_parser_matches_and_consume parser

13ecce6

tomalexander added 5 commits December 13, 2020 16:49

Fix issues from rebase.

d5c37b6

- Migrated to the Parser<> trait - Changed Fn -> FnMut as required by the Parser trait - separated_list -> separated_list1

Add documentation about performance as suggested by @tanriol in gitter.

edddae3

Switch take_until_parser_matches to iterating over indices as suggest…

6805b79

…ed by @NickNick on github. This avoids the issue of splitting an index of a string between unicode codepoints.

Apply the same change to take_until_parser_matches_and_consume.

6981023

Fix issues from rebasing off latest code.

9708fb9

tomalexander force-pushed the take_until_parser_matches branch from 1b7896c to 9708fb9 Compare December 13, 2020 22:09

Disable executing the powerful example since it relies on combinators…

803e17c

… behind features.

tomalexander mentioned this pull request Dec 19, 2020

Feature Request: re_take_until! #709

Open

chotchki added a commit to chotchki/nom that referenced this pull request Oct 24, 2021

Updated the pull request from rust-bakery#469 so that it will cleanly…

18d2964

… apply against master.

chotchki mentioned this pull request Oct 24, 2021

Updated the pull request from #469 so that it will cleanly apply against master. #1441

Open

Geal modified the milestones: 6.0, 8.0 Jan 1, 2023

tomalexander closed this Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement take_until_parser_matches #469

Implement take_until_parser_matches #469

tomalexander commented Mar 20, 2017

tomalexander commented Mar 20, 2017

coveralls commented Mar 20, 2017 •

edited

Loading

coveralls commented Mar 20, 2017

chengsun commented Mar 23, 2017 •

edited

Loading

tomalexander commented Mar 23, 2017

Geal commented Jul 17, 2017

tomalexander commented Jul 17, 2017 via email

tomalexander commented Mar 8, 2020

tomalexander commented Mar 14, 2020

tomalexander commented Apr 12, 2020 •

edited

Loading

tomalexander commented Jun 21, 2020

NickNick commented Dec 4, 2020 •

edited

Loading

tomalexander commented Dec 8, 2020

tomalexander commented Dec 13, 2020

meritozh commented May 14, 2021

tomalexander commented May 15, 2021

Geal commented May 15, 2021

tomalexander commented May 15, 2021

meritozh commented Jun 19, 2021

homersimpsons commented Aug 27, 2021

meritozh commented Aug 27, 2021

chotchki commented Oct 17, 2021

Stargateur commented Oct 25, 2021

tomalexander commented Mar 14, 2023

Implement take_until_parser_matches #469

Implement take_until_parser_matches #469

Conversation

tomalexander commented Mar 20, 2017

tomalexander commented Mar 20, 2017

coveralls commented Mar 20, 2017 • edited Loading

coveralls commented Mar 20, 2017

chengsun commented Mar 23, 2017 • edited Loading

tomalexander commented Mar 23, 2017

Geal commented Jul 17, 2017

tomalexander commented Jul 17, 2017 via email

tomalexander commented Mar 8, 2020

tomalexander commented Mar 14, 2020

tomalexander commented Apr 12, 2020 • edited Loading

tomalexander commented Jun 21, 2020

NickNick commented Dec 4, 2020 • edited Loading

tomalexander commented Dec 8, 2020

tomalexander commented Dec 13, 2020

meritozh commented May 14, 2021

tomalexander commented May 15, 2021

Geal commented May 15, 2021

tomalexander commented May 15, 2021

meritozh commented Jun 19, 2021

homersimpsons commented Aug 27, 2021

meritozh commented Aug 27, 2021

chotchki commented Oct 17, 2021

Stargateur commented Oct 25, 2021

tomalexander commented Mar 14, 2023

coveralls commented Mar 20, 2017 •

edited

Loading

chengsun commented Mar 23, 2017 •

edited

Loading

tomalexander commented Apr 12, 2020 •

edited

Loading

NickNick commented Dec 4, 2020 •

edited

Loading