You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When i try to convert a neox trained LLAMA model (config below) with convert_neox_to_hf.py i get the error showcased in the screenshot.
So in my view, during training, the dimension of the mlp layers don't get configured correctly. I hadn't come across this issue at least before #1212.
To Reproduce
Train a model with the provided config and try to convert it to Huggingface format.
Proposed solution
I would look at #1276 and #1212 for possible issues regarding LLAMA and mlp which could let to the forementioned problem.
One could also revert back to the LLAMAParallelMLP class and mlp_type: "llama" parameter combination from before.
Screenshots
Environment (please complete the following information):
I also encountered this issue in the llama-type MLP, and I had to set the 'intermediate_size' to three times the intended value to deal with it.
I made a pull request (#1309) which fixed the llama configuations in the 'example' directories. I hope this helps.
Describe the bug
When i try to convert a neox trained LLAMA model (config below) with convert_neox_to_hf.py i get the error showcased in the screenshot.
So in my view, during training, the dimension of the mlp layers don't get configured correctly. I hadn't come across this issue at least before #1212.
To Reproduce
Train a model with the provided config and try to convert it to Huggingface format.
Proposed solution
I would look at #1276 and #1212 for possible issues regarding LLAMA and mlp which could let to the forementioned problem.
One could also revert back to the LLAMAParallelMLP class and mlp_type: "llama" parameter combination from before.
Screenshots
Environment (please complete the following information):
Libraries:
deepspeed @ git+https://github.com/EleutherAI/DeeperSpeed.git@02e2ebf7dee6aaab3d89094ed470a4609763c742 flash-attn @ file:///opt/wheels/flash_attn-2.0.4-cp310-cp310-linux_x86_64.whl#sha256=0dc568c7b3516cc3f45f33858fe5ef048e5b7a82ba56c89189d5f6a97f4574f2 ftfy==6.2.3 lion-pytorch==0.1.4 lm-dataformat @ git+https://github.com/EleutherAI/lm_dataformat.git@4eec05349977071bf67fc072290b95e31c8dd836 lm_eval==0.4.1 mpi4py @ file:///opt/wheels/mpi4py-3.1.4-cp310-cp310-linux_x86_64.whl#sha256=6e012d8c61c0a0d8d6e93b4d98ba6946bb5a5c3d8280d1e0db93862ec19025c2 numpy==1.26.3 pybind11==2.13.6 pytorch-triton-rocm==2.2.0 regex==2024.5.15 sentencepiece==0.2.0 six==1.16.0 tiktoken==0.7.0 tokenizers==0.15.2 torch==2.2.2+rocm5.6 torchaudio==2.2.2+rocm5.6 torchdata==0.7.1 torchtext==0.17.2+cpu torchvision==0.17.2+rocm5.6 transformers==4.38.0 Python 3.10.13
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: