You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when trying to run constrained decoding with a new grammar, we are prompted with
Creating DFA mask store for LlamaTokenizerFast and custom, may take more than 10 minutes.
This holds true even for the smallest grammar examples.
Sadly, this makes experimenting and debugging grammars quite cumbersome, because any modification will result in a cache miss and will trigger this expensive preprocessing.
Additionally, we are currently working in a setup where the grammar is modified on the fly in between interactions with the LLM, and so 10 minutes is prohibitively expensive.
Although the decoding performance might be affected, is there a way to skip this preprocessing?
The text was updated successfully, but these errors were encountered:
Removing DFA mask store dependency is causing inference to be bit too slower than what I expected, and might not work out as easily. I'll spend a little more time on this to see if I could make it work this week.
Slightly, longer term, I think if I could build this DFA mask store incrementally where it reuses cache for previous grammar rules - this could be a much better solution. I'll spend some time exploring that as well.
Currently, when trying to run constrained decoding with a new grammar, we are prompted with
This holds true even for the smallest grammar examples.
Sadly, this makes experimenting and debugging grammars quite cumbersome, because any modification will result in a cache miss and will trigger this expensive preprocessing.
Additionally, we are currently working in a setup where the grammar is modified on the fly in between interactions with the LLM, and so 10 minutes is prohibitively expensive.
Although the decoding performance might be affected, is there a way to skip this preprocessing?
The text was updated successfully, but these errors were encountered: