Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is
functional
but probably not in an ideal state? All of the grammar related code is cribbed from UtilityAI's library I'm not sure if we want to dump this and build something totally new the code seemed fairly well written and I didn't see value in re-inventing the wheel here. I will note that neither the UtilityAI or this implementation of sampler code is exactly the same as the llama.cpp sampling logic specifically we're missing the resampling behavior for increased performance when using grammars.Additional Notes
For what it's worth I don't know if it's trivial to attempt to implement this as a sampler stage? We always need to accept the token's after sampling has been completed and a token has been selected I wasn't able to figure out a strategy for dealing with moving around the grammar in a way that made sense and it also felt awkward to include token acceptance as a
SamplerStage
.Usage
Basic usage example below,
Code
Grammar
Output