-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate problems with ctranslate2 #75
Comments
Shouldn't the parameter for the beamsize be beam_size instead of num_hypotheses? |
@SebastianBodza I really need to do a better job in the README, yes you can run everything locally. Create the prompts with Now you can run the ctranslate2 (why does my brain refuse to remember this correctly ugh) interview with:
This will download the model from HF if it's not already cached. My initial observations when implementing this runtime in #62 were that if you try As to your second point: that's an interesting thought. There should be 2 paramters to beam searching, one for the number of beams to consider and another for the size or length of those beams. When I first went through the docs I left with the impression that the |
Thanks for the clarification! I ran some tests locally.
However it seems to also be a bit unstable. Another run with the same settings:
For the beam_size i think you are right. num_hypotheses should be correct. |
@SebastianBodza Yes something seems to be wrong with the implementation of repeat penalty in this runtime, but I haven't yet dived into the code to see whats up. This isnt normally a complex operation. If you want to try it on something with repeat penalty that should be otherwise stable, that's the goal of |
I've implemented batching and basic stop-seq support for this runtime, but batching seems to only make the instability problems here worse :/ I wonder if upstream issue #1425 is related and we have some unstable sort related issues happening here.. |
Hi, The issue related to the callback in batch mode should be fixed in However, I'm not sure what is the issue with repetition penalty. For now I suggest forcing this value to 1 for CTranslate2 if this value works for you. In general repetition penalty should not be needed when using a random sampler. |
@guillaumekln I am having trouble with this runtime following upgrade of my container to CUDA 12.1, it complains of Does ct2 only support CUDA 11 at this time? |
Is it possible to investigate the problems of ctranslate2 in more detail? The library is one of the fastest and supports token streaming. Unfortunately with beam search no token streaming is possible and there the performance is quite bad :/
Is there any way to run the interview locally?
P.s. in the readme cformers2 should be ctranslate2
The text was updated successfully, but these errors were encountered: