Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3 8B 32K sample generates garbage #82

Open
samir-souza opened this issue Jul 19, 2024 · 1 comment
Open

Llama3 8B 32K sample generates garbage #82

samir-souza opened this issue Jul 19, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@samir-souza
Copy link
Contributor

samir-souza commented Jul 19, 2024

Model generates only garbage.

Sample: https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3-8b-32k-sampling.ipynb

NeuronSDK2.19 PyTorch 1.13.1

aws-neuronx-runtime-discovery 2.9
libneuronxla 0.5.1795
neuronx-cc 2.14.213.0+013d129b
neuronx-distributed 0.8.0
torch-neuronx 1.13.1.1.15.0
torch-xla 1.13.1+torchneuronf
transformers-neuronx 0.11.351

ii aws-neuronx-collectives 2.21.46.0-69b77134b amd64 neuron_ccom built using CMake
ii aws-neuronx-gpsimd-customop-lib 0.11.4.0 amd64 custom_op_trn1_install built using CMake
ii aws-neuronx-gpsimd-tools 0.11.3.0-36dcb86d4 amd64 gpsimd_tools built using CMake
ii aws-neuronx-runtime-lib 2.21.41.0-fb1705f5f amd64 neuron_runtime built using CMake
ii aws-neuronx-tools 2.18.3.0 amd64 Neuron profile and debug tools

Example from the notebook generates:
num_input_tokens: 26828
generated sequence 1. We propose a new gated linear recurrent unit (RG-LRU) that is efficient to compute on TPU-v3. 2. We propose Griffin, a hybrid model that mixes the RG-LRU with local attention. 3. Griffin and Hawk achieve comparable performance to Transformers on downstream tasks. 4. Griffin and Hawk extrapolate to longer sequences than Transformers. 5. Griffin and Hawk are more efficient than Transformers at inference. 6. Griffin and Hawk are efficient at copying and retrieval tasks. 7. Griffin and Hawk are efficient at training. 8. Griffin and Hawk are efficient at inference. 9. Griffin and Hawk are efficient at training. 10. Griffin and Hawk are efficient at inference. 11. Griffin and Hawk are efficient at training. 12. Griffin and Hawk are efficient at inference. 13. Griffin and Hawk are efficient at training. 14. Griffin and Hawk are efficient at inference. 15. Griffin and Hawk are efficient at training. 16. Griffin and Hawk are efficient at inference. 17. Griffin and Hawk are efficient at training. 18. Griffin and ..... and repeats the same thing for the rest of the 32K

Custom prompt
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a json format specialist<|eot_id|><|start_header_id|>user<|end_header_id|>

<JSON_DOCUMENT>
{"a": invalid text, "b": how are you?}
</JSON_DOCUMENT>
Can you fix the given json document for me, please?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

Output
generated sequence користувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористув and repeats.

@shubhamchandak94
Copy link

Issue is due to using f16 type instead of bf16 (which is what the model weights are in). We will update the tutorial in the next release

@aws-taylor aws-taylor added bug Something isn't working documentation Improvements or additions to documentation and removed bug Something isn't working labels Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants