Llama3 8B 32K sample generates garbage #82

samir-souza · 2024-07-19T11:02:41Z

Model generates only garbage.

Sample: https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3-8b-32k-sampling.ipynb

NeuronSDK2.19 PyTorch 1.13.1

aws-neuronx-runtime-discovery 2.9
libneuronxla 0.5.1795
neuronx-cc 2.14.213.0+013d129b
neuronx-distributed 0.8.0
torch-neuronx 1.13.1.1.15.0
torch-xla 1.13.1+torchneuronf
transformers-neuronx 0.11.351

ii aws-neuronx-collectives 2.21.46.0-69b77134b amd64 neuron_ccom built using CMake
ii aws-neuronx-gpsimd-customop-lib 0.11.4.0 amd64 custom_op_trn1_install built using CMake
ii aws-neuronx-gpsimd-tools 0.11.3.0-36dcb86d4 amd64 gpsimd_tools built using CMake
ii aws-neuronx-runtime-lib 2.21.41.0-fb1705f5f amd64 neuron_runtime built using CMake
ii aws-neuronx-tools 2.18.3.0 amd64 Neuron profile and debug tools

Example from the notebook generates:
num_input_tokens: 26828
generated sequence 1. We propose a new gated linear recurrent unit (RG-LRU) that is efficient to compute on TPU-v3. 2. We propose Griffin, a hybrid model that mixes the RG-LRU with local attention. 3. Griffin and Hawk achieve comparable performance to Transformers on downstream tasks. 4. Griffin and Hawk extrapolate to longer sequences than Transformers. 5. Griffin and Hawk are more efficient than Transformers at inference. 6. Griffin and Hawk are efficient at copying and retrieval tasks. 7. Griffin and Hawk are efficient at training. 8. Griffin and Hawk are efficient at inference. 9. Griffin and Hawk are efficient at training. 10. Griffin and Hawk are efficient at inference. 11. Griffin and Hawk are efficient at training. 12. Griffin and Hawk are efficient at inference. 13. Griffin and Hawk are efficient at training. 14. Griffin and Hawk are efficient at inference. 15. Griffin and Hawk are efficient at training. 16. Griffin and Hawk are efficient at inference. 17. Griffin and Hawk are efficient at training. 18. Griffin and ..... and repeats the same thing for the rest of the 32K

<JSON_DOCUMENT>
{"a": invalid text, "b": how are you?}
</JSON_DOCUMENT>
Can you fix the given json document for me, please?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

Output
generated sequence користувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористув and repeats.

shubhamchandak94 · 2024-07-25T03:44:56Z

Issue is due to using f16 type instead of bf16 (which is what the model weights are in). We will update the tutorial in the next release

aws-taylor added bug Something isn't working documentation Improvements or additions to documentation and removed bug Something isn't working labels Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3 8B 32K sample generates garbage #82

Llama3 8B 32K sample generates garbage #82

samir-souza commented Jul 19, 2024 •

edited

Loading

shubhamchandak94 commented Jul 25, 2024

Llama3 8B 32K sample generates garbage #82

Llama3 8B 32K sample generates garbage #82

Comments

samir-souza commented Jul 19, 2024 • edited Loading

shubhamchandak94 commented Jul 25, 2024

samir-souza commented Jul 19, 2024 •

edited

Loading