-
-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate LLaVA for multimodal pre-training #781
base: main
Are you sure you want to change the base?
Conversation
Anyone have any ideas around this stack trace?
|
|
Gpt4 says It looks like you're encountering a CUDA error related to indexing in PyTorch. This error is often caused by an invalid index being used to access tensor elements. Here's a breakdown of the issue:
To troubleshoot and resolve this issue:
Remember, this type of error is almost always related to incorrect indexing. Start by reviewing any indexing operations, slicing, or other tensor manipulations in your code. |
Maybe you could try using nightly cuda and pytorch? |
adding some notes here from troubleshooting:
|
here's the changes to llava that need to be made upstream:
|
Upstream PR here haotian-liu/LLaVA#694 |
|
there are definitely optimizations as the LazySupervisedDataset processes all the images on the fly, thus bouncing between the image model and the text model. We could probably preprocess the entire dataset similar to our existing workflows, and also eventually enable sample packing for this https://github.com/haotian-liu/LLaVA/blob/66044b727e30f589c6dbf7b58fce021b73566b36/llava/train/train.py#L660-L707 |
Hey, was this the branch used for training |
Any updates on this PR ? @winglian |
+1 for llava finetuning with axolotl |
you'll need to download the
images.zip
from https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/tree/main into allava
folder to use thisthis PR simply mostly reimplements this file https://github.com/haotian-liu/LLaVA/blob/66044b727e30f589c6dbf7b58fce021b73566b36/llava/train/train.py