Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip or speedup lexer preprocessing #115

Open
teoremma opened this issue Oct 1, 2024 · 3 comments
Open

Skip or speedup lexer preprocessing #115

teoremma opened this issue Oct 1, 2024 · 3 comments

Comments

@teoremma
Copy link

teoremma commented Oct 1, 2024

Currently, when trying to run constrained decoding with a new grammar, we are prompted with

Creating DFA mask store for LlamaTokenizerFast and custom, may take more than 10 minutes.

This holds true even for the smallest grammar examples.

Sadly, this makes experimenting and debugging grammars quite cumbersome, because any modification will result in a cache miss and will trigger this expensive preprocessing.
Additionally, we are currently working in a setup where the grammar is modified on the fly in between interactions with the LLM, and so 10 minutes is prohibitively expensive.

Although the decoding performance might be affected, is there a way to skip this preprocessing?

@shubhamugare
Copy link
Collaborator

At this point, there is no way to skip it. But I can implement it soon.
(probably by the end of this week)

@shubhamugare
Copy link
Collaborator

shubhamugare commented Oct 8, 2024

Hi @teoremma,

Removing DFA mask store dependency is causing inference to be bit too slower than what I expected, and might not work out as easily. I'll spend a little more time on this to see if I could make it work this week.

Slightly, longer term, I think if I could build this DFA mask store incrementally where it reuses cache for previous grammar rules - this could be a much better solution. I'll spend some time exploring that as well.

Either way, I'll update you on this soon.

@teoremma
Copy link
Author

teoremma commented Oct 8, 2024

I see, that's understandable. Building the DFA incrementally would also help to make experimentation faster.

Thanks a lot, I'll keep an eye on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants