-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update streaming input
and input type
columns
#48
Open
Easyoakland
wants to merge
1
commit into
rosetta-rs:main
Choose a base branch
from
Easyoakland:update-table
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"streaming input" means "can it handle operating on a partial/incomplete input"
Stream
trait. I'm not seeing anything about partial / incomplete inputThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is sufficient for a parser to be able to make progress with partial input to support "streaming input".
As an extreme example:
Then a streaming parser will be finished at minute 101.
A non-streaming parser will have to wait 100 minutes for all the tokens then spend another 100 minutes on processing resulting in 200 minutes to finish parsing.
It seems there are two ways of doing this.
nom
andwinnow
use with theirIncomplete
error variants) where a parser takes a partial input and maybe a partial state and returns the new partial or complete state.If you can parse a token iterator you can parse a stream. You write your input stream as an iterator.
For
chumksy
, you can use an iterator over the stream of input using thefrom_iter
method.I'm not super familiar with
combine
, but the link you posted seems to be a newtype to signal a certain behavior when reaching the end of input. This is not necessary for parsing a stream because you don't need to reach the end of the stream until you have all the tokens. Write your stream to block or await for available tokens and your parser doesn't need to know.For example in
yap_streaming
fizzbuzz example new tokens can take an unbounded amount of time, but the parser can process all tokens it has so far received without ever knowing that it waited for input.I think the
combine
link is correct because the options available at that page seem to be how one would handle different kinds of streams.I have no problem changing the
winnow
link.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that does use an iterator, without seeing an example showing this use case, I question how it would work. For example, how do you handle the end span?
This all seems very handy wavy guesses as to how its supposed to work and without verified examples, who knows if all of the practical aspects are taken care of.
And examples only help in calling attention to it and not fully resolving it. For example, I commented in the issue about IO error handling for
yap_streaming
but also blocking in the parser could end up with serious ramifications for an application.I am curious, how do you know when you can stop keeping state for backtracking? Is a marker made for the outer most backtracking and as you unwind past it, you free it, allowing the buffer to be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have time I'll see if I can setup an example. Maybe I'll prove myself wrong, though I don't see what would prevent parsing a
Reader::bytes()
iterator.I'm not sure what you mean by handling the end span. Maybe its related to how
chumsky
handles theNone
case here with theeoi
span?I'll file an issue on
combine
after we agree on a definition for "streaming input". We might already? Just double-checking.I'll comment there on handling IO and blocking. I'm not sure what the comment on examples fully means.
I can't speak to how
chumsky
does it. Inyap_streaming
backtracking can only occur withTokenLocation
so creating one adds the current offset to a list and removed from the list when dropped. Items are only copied to the buffer if aTokenLocation
exists which might need it when a reset occurs. Items are only dropped from the buffer if the oldestTokenLocation
in the list is younger.