Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408 #13

Open
LuchenZhou opened this issue Dec 7, 2024 · 0 comments

Comments

@LuchenZhou
Copy link

We encountered a shape mismatch error while trying to reproduce Duo Attention. We tested versions of transformers from 4.37 to 4.47, and the issue shifted from a RuntimeError: Boolean value of Tensor with more than one value is ambiguous to a RuntimeError: shape '[1, 3098, 6, 5, 128]' is invalid for input of size 12689408. We couldn't resolve the issue by changing the versions.

We also tried different models with the following commands:

huggingface-cli download togethercomputer/Llama-2-7B-32K-Instruct --local-dir Llama-2-7B-32K-Instruct
huggingface-cli download gradientai/Llama-3-8B-Instruct-Gradient-1048k --local-dir Llama-3-8B-Instruct-Gradient-1048k
huggingface-cli download gradientai/Llama-3-8B-Instruct-Gradient-4194k --local-dir Llama-3-8B-Instruct-Gradient-4194k
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2 --local-dir Mistral-7B-Instruct-v0.2
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.3 --local-dir Mistral-7B-Instruct-v0.3

However, none of these models worked. There was a previous issue suggesting that updating the transformer version could solve the problem, but we are still getting shape mismatch errors.

Could there be other packages that need to be updated as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant