-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mamba-Ssm - Loader for Mamba State Space models #5228
Conversation
would love to see this merged |
I would very much like to get feedback on what's still to do to get this merged. Maybe I can test/review another PR in turn. Give me a hint on how to help out! |
Training already works as a prototype but needs serious rework before pushing. Should only take some days. |
Is this only supported on Linux? I tried installing your branch and it kept giving me some pip installation errors.
|
…package needs it but does not require it itself.
Hi @minipasila Thanks for using and thereby testing this branch!
While I develop and test only on Ubuntu, there is no reason that this should not work on windows. In this case the error is caused by the original mamba-ssm python package not declaring it's dependencies correctly. I have added the "packaging" package to the requirements.txt and now I'm at least able to successfully And there may also be a hard dependency on CUDA that comes from the original mamba-ssm package. I'm still unsure how to deal with that. Please try the updated branch and report any errors. While I can't test on Windows, I'll try to address all bug reports for Windows as good as I can. |
The first version of training code has landed. These models seem to learn very well. |
For whatever reason it wouldn't install "packaging" package before it tried installing the mamba stuff, so I did that manually and it started installing them until it gave another error, stating that it depends on triton but I think that's only supported on Linux currently, unless that has changed recently.
|
I looked it up, triton is still Linux only. That's unlucky. But I don't think I can fix that easily. Then this has to be windows only for the moment. Can you try WSL? |
I did manage to find this https://github.com/jakaline-dev/Triton_win/releases/tag/2.1.0 even though it installed triton successfully that didn't fix the issue.. But since I don't want to deal with WSL right now I tried it on Google Colab and that seemed broken as well for some reason (maybe old cuda version?) Colab error
But on Runpod I was able to get it working successfully without problems. I was able to load the model and generate text. |
@minipasila Thank you for trying again! Too bad Triton_win didn't work. That would have been perfect. For the Google Colab error, I can't say anything. I've never tried Colab.
That is awesome! Thank you for investing the resources to try it! Then we have one datapoint that the changes generally work. :-) |
… Made fix for mixed precision models not apply to SSMs.
… not work because the free instances on Colab all dont support bfloat16
I fixed the "libcuda.so not found" error on Google Colab. The cause was the default installation of triton==2.1 which breaks on Colab. Forcing a downgrade to triton==2.0 fixed that. But Mamba still doesn't work on the free runtimes, because neither the T4 (I assume Tesla T40 GPUs) nor the TPU instances support bfloat16. Maybe someone with Colab credits can now test the premium GPUs. I will have a look into making Mamba work with other data types, but only after this PR is merged. |
This PR lacks a justification for adding mamba support. It is a promising alternative architecture for LLMs, but the linked model is small both in number of parameters (2.8b) and training dataset size (600b). What is the use case? Another promising alternative architecture is RWKV and it suffers from the same problem. |
mamba is the hottest thing since sliced bread is the justification. limitless context. non transformer based model (no attention) and it's much more efficient to train than transformer models (faster convergence). |
I didn't know I needed one as there already was an issue asking for it and, as others already stated, mamba is the new cool kid in town. But I'll happily provide one: This PR adds support for Mamba State space models to allow experimentation with and evaluation of this new model architecture. In first experments Mamba models provide better or similar performance than transformer models in comparable size, while also offering benefits like constant memory and linear time requirements for larger context sizes and higher training efficency. These benefits are especially important for users of text-generation-webui as most of us have to do with very limited resources.
It performs on the level of 7b transformers which are widely used. With lower VRAM requirements, making it even more accessible for smaller GPUs. I would love to already have a mamba-13b or larger. There are rumours that companies are training on those, but I know nothing for sure. What I know for sure, is that companies are much more likely to invest millions in compute if the technology has wider software support. By implementing it in a widely used software like text-generation-webui we can encourage those investments, leading to better research in this model type. Any new model architecture that could surpass transformer models will need a lot of support from all sides to actually do it, because transformers already have a large ecosystem and literal billions and billions of investment dollars. The open-source community can implement, test and then adopt or drop new model types easily, but we need the upfront pretraining by larger organisations. Getting that will be easier if the tooling is ready.
Fun, Research, specialised fine tunes. During development of this PR I used it (an in between version, not the final one) to create this model: https://huggingface.co/IggoOnCode/mamba-2.8b-slimpj-OpenOrca_1ep First evaluation suggests that I'm probably not good at this. Perplexity went up, accuracy down. That can happen in fine-tune as I have read, futher testing how it feels is required. But it proves the reason why I wanted to use text-generation-webui for training. Using the webui I had saved a snapshot during training which I could evaluate too. Turns out that perplexity and accuracy where worse at 50% of the training run and got better after that. (Next I maybe try to train another epoch on top of it, but I'll see).
RWKV has been supported individually by text-generation-webui before the support moved directly into the transformers library. When mamba moves into transformers too, then the special mamba support can easily be removed. If Mamba doesn't make it, it can also be removed. What I take from this argument is that the mamba integration should be as easily removable as possible. To facilitate this I will go over the training code again and make the distinction not between llama model or mamba model, but between lora and full fine-tune, which it basically is. That has the additional advantage that we may be able to train RWKV models too (and other model types that may come in the future). |
Good news, mamba support in transformers is definitely coming: huggingface/transformers#28094 Sadly it didn't "just work" when using the add-mamba branch from transformers in text-generation-webui. The models could not even be loaded, although loading should already work for them. And the transformers branch does not support any training yet (according to the issue to the PR). The code in this PR now works for full fine-tuning of mamba and llama and lora training of llama. But I had to keep four references to mamba in training.py. I would be happy if this PR could be reviewed and considered for merging. I already have prepared a branch for the removal of mamba-ssm (https://github.com/IggoOnCode/text-generation-webui/tree/remove_mamba-ssm). When this PR gets merged I will immediately create a draft PR from that branch that I will keep conflict-free until the transformers implementation for mamba is ready to replace this implementation. |
🎊
…On Sat, Feb 10, 2024, 11:13 AM IggoOnCode ***@***.***> wrote:
Good news, mamba support in transformers is definitely coming:
huggingface/transformers#28094
<huggingface/transformers#28094>
Sadly it didn't "just work" when using the add-mamba branch from
transformers in text-generation-webui. The models could not even be loaded,
although loading should already work for them. And the transformers branch
does not support any training yet (according to the PR).
The code in this PR now works for full fine-tuning of mamba and llama and
lora training of llama. But I had to keep four references to mamba in
training.py.
I would be happy if this PR could be reviewed and considered for merging.
I already have prepared a branch for the removal of mamba-ssm (
https://github.com/IggoOnCode/text-generation-webui/tree/remove_mamba-ssm).
When this PR gets merged I will immediately create a draft PR from that
branch that I will keep conflict-free until the transformers implementation
for mamba is ready to replace this implementation.
—
Reply to this email directly, view it on GitHub
<#5228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHKKOR6PMDGRHWUYBFCMIDYS7BEHAVCNFSM6AAAAABBVI7PX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXGA4TSMJYGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b", padding_side = "left")
tokenizer.pad_token = tokenizer.eos_token
model = MambaForCausalLM.from_pretrained("state-spaces/mamba-130m", vocab_size=50280, num_hidden_layers=24, torch_dtype=torch.float32)
model.config.use_cache = True
input_ids = tokenizer(["Hey how are you doing?", "Explain how soy sauce is made"], padding=True, return_tensors= "pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out)) the branch in transformers now supports this 🤗 |
As you obviously don't want the mamba-ssm code, are you interested in only the training changes (bringing back full fine-tuning)? If so, I would move them to an own PR. But only if you at least give any indication on how to progress now. |
@IggoOnCode, I appreciate your contribution, but as a hobby developer maintaining this project by myself, I have to be selective about which changes I can take on. Accepting the PR means committing to long-term maintenance and handling of future PRs related to the new loader, which can be a significant time investment. To make it easier to integrate experimental loaders, I would like to refactor the project to have self-contained loaders. This would allow each loader to have its own functions for logits, text generation, etc., and be maintained independently. Regarding training, the current code could benefit from a review to ensure best practices are being followed. Specifically, the code contains parameters not found elsewhere like |
@oobabooga That's completely reasonable. Thanks for letting me know. I'll see what I can do for the training code while doing my experiments. |
Checklist:
This PR adds support for "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" as described in https://arxiv.org/abs/2312.00752
Currently it's able to use https://huggingface.co/state-spaces/mamba-2.8b-slimpj for inference.
I plan to add more features and training later. For now I hope we can add just basic inference.
This implements the issue: #4830