-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement take_until_parser_matches #469
Implement take_until_parser_matches #469
Conversation
Hey, sorry that took so long. Ran into an old friend I hadn't talked to in over half a decade and I've been recovering from that hangover. |
This looks very useful, but a bit broken as implemented. Specifically, if the child parser never matches,
Also, maybe we could also have a |
Thanks for taking a look! That behavior was my intent, but I just checked and you're right. The other take_until* macros throw an error if theres no match. I'll revise (and add more tests for that) |
any news on this? |
Ah sorry. Since posting this merge request I haven't found time to update
it (changed jobs, sister's wedding, etc). FWIW this is a feature I still
want but I can't commit to any particular timeframe so would you prefer I
close the merge request and open a new one later or keep this one lingering?
…On Jul 17, 2017 3:26 AM, "Geoffroy Couprie" ***@***.***> wrote:
any news on this?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#469 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAb-IGx8m4wxG6uHTE4dDmMe44MDP0z9ks5sOzbYgaJpZM4Mh9wo>
.
|
6c99077
to
e7ca818
Compare
fbd7d65
to
cfc19f1
Compare
b9a827d
to
7d05a21
Compare
Hey, back from the dead. I'm loving the switch to functions in nom version 5! I've updated the PR the latest master, rewrote it as a function instead of a macro, and fixed the style issue chengsun raised about it not returning an error if the parser does not find anything to match. I haven't implemented |
I thought about it some more and I decided to add in a |
Hey, I just made a small design change, so I want to call it out here to make sure that the project is aware of it and agree with it. Namely, I changed it to attempt to use its inner parser on the empty string after the full input has been captured. Previously, take_until_parser_matches would apply its inner parser on all substrings until the end of the input, but not including the end of the input. So for example, if the input was An example use-case where this change is helpful is trying to read until the end of the input, excluding any trailing whitespace. You could write this parser as: assert_eq!(
Ok(("\n", "foo bar baz")),
take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))(
"foo bar baz\n"
)
); but if there was no trailing whitespace, even though assert_eq!(
Ok(("", "foo bar baz")),
take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))(
"foo bar baz"
)
); After this patch, both code blocks succeed. Also I see I need to rebase my branch on the latest changes. I'll get to that now. |
0b00048
to
74e1603
Compare
I think its important to point out the difference between this and
The difference is useful when the first parser is aggressive/insatiable (I want to say 'greedy' but I know in parsing thats the opposite). So, for example, lets say you had a string: let x: IResult<&str, (Vec<&str>, &str)> =
many_till(alt((tag("ab"), tag("a"))), tag("befg"))("ababababefg");
assert_eq!(x, Ok(("", (vec!["ab", "ab", "ab", "a"], "befg")))); But that won't work because the first parser will eat the "b" in "befg", causing the second parser to never match. Whereas let x: IResult<&str, (Vec<&str>, &str)> = tuple((
map_parser(
take_until_parser_matches(tag("befg")),
many1(alt((tag("ab"), tag("a")))),
),
tag("befg"),
))("ababababefg");
assert_eq!(x, Ok(("", (vec!["ab", "ab", "ab", "a"], "befg")))); and it would work. These are obviously contrived examples, but I think it would be particularly useful for parsing org-mode headlines. The headlines support "tags" at the end of the line, so you could have a headline like:
but if the ending isn't a valid tag (for example, having spaces or not having a closing colon) then it gets treated as part of the headline
so with
but with
leaving you with an unfortunate vector of single characters instead of a nice slice. |
74e1603
to
9aed9e4
Compare
Hi! I was taking this for a spin, but got a panic when the 'take_split' tried to index a string between code points:
This happens here: for ind in 0..=input_length {
let (remaining, taken) = i.take_split(ind);
match f.parse(remaining) {
Err(_) => (), I changed it into this (added InputIter, use iter_indices):
Which seems to work for me, but please do check it. :) Of course, thanks for writing it in the first place! Couldn't have done it myself. 👍 |
Hey thanks for the report and the suggested change! I'll try to look at it this weekend (I'm taking vacation days this week to work on a side project so I don't want get sidetracked during the week) |
Thanks! I like the change. The only change I needed to add was for checking the parser after the input has been exhausted. |
The take_until_parser_matches parser iterates through the input attempting to apply the inner parser to it. Upon successfully applying the inner parser, it will consume and return the input up until the inner parser's match while not consuming the contents of the inner parser.
Previously, take_until_parser_matches would apply its inner parser on all substrings until the end of the input, but not including the end of the input. So for example, if the input was `"foo"` it would test `f("foo")`, `f("oo")` and `f("o")` but not `f("")`. This meant that it could never capture the full input. An example use-case where this change is helpful is trying to read until the end of the input, excluding any trailing whitespace. You could write this parser as: ```python assert_eq!( Ok(("\n", "foo bar baz")), take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))( "foo bar baz\n" ) ); ``` but if there was no trailing whitespace, even though `multispace0` accepts an empty input, this would fail: ```python assert_eq!( Ok(("", "foo bar baz")), take_until_parser_matches::<_, _, _, (_, ErrorKind)>(all_consuming(multispace0))( "foo bar baz" ) ); ``` After this patch, both code blocks succeed.
- Migrated to the Parser<> trait - Changed Fn -> FnMut as required by the Parser trait - separated_list -> separated_list1
…ed by @NickNick on github. This avoids the issue of splitting an index of a string between unicode codepoints.
1b7896c
to
9708fb9
Compare
… behind features.
this is very useful. any process on this PR? |
Thanks! Last I heard from @Geal he was very busy so he didn't have a lot of time to spend on PRs. AFAIK all thats remaining is code review and merging. |
Yup, still pretty busy, but I intend to do a pass on PRs soon |
Thats wonderful news, thanks! |
@Geal hi,this PR not merged into 6.2?Or we delay it to 7.0? |
@homersimpsons |
I just wanted to add that this pull request works great for my use case. I'm trying to parse Postgres compatible sql constant strings. There is a variant that auto joins separated strings together ONLY if they are separated by a newline. The take_until_parser_matches_and_consume function is perfect for this. |
… apply against master.
See my comment I don't think this is a good addition to nom |
Good enough for me, closing |
Implement a macro that will consume input until the child parser matches