-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Empty token can appear at the beginning of a generated sequence #140
Comments
Looks like this token is actually a "prefix_space" ( |
I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set. |
Oh could this simply be a matter of setting https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79 @sunggg Any reason we are using |
I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191 |
* Add support for downloading weights from HF path * Support for custom model path * Read model type from HF config * Pass config into model * Find conversation template with llama prefix * Fix merge conflict * Load llama config from HF config * Skip downloading weights when they exist * Read in gpt_neox config * Read gpt_neox config from HF config * Read in moss config from HF config * Model type check for HF model path * Code cleanup * Update readme with new build instructions * Add documentation for building from source * Fix config loading from local path
It seems, as of #107 which introduced
detokenize_incrementally
from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.
The text was updated successfully, but these errors were encountered: