Integrate LLaVA for multimodal pre-training #781

winglian · 2023-10-24T03:26:00Z

you'll need to download the images.zip from https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/tree/main into a llava folder to use this

this PR simply mostly reimplements this file https://github.com/haotian-liu/LLaVA/blob/66044b727e30f589c6dbf7b58fce021b73566b36/llava/train/train.py

winglian · 2023-10-24T03:29:57Z

Anyone have any ideas around this stack trace?

  File "/root/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop                                                                                                                                                                              
    tr_loss_step = self.training_step(model, inputs)                                                                                                                                                                                                                                            
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                            
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 2776, in training_step                                                                                                                                                                                     
    loss = self.compute_loss(model, inputs)                                                                                                                                                                                                                                                     
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                     
  File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 252, in compute_loss                                                                                                                                                                                                      
    return super().compute_loss(model, inputs, return_outputs=return_outputs)                                                                                                                                                                                                                   
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                   
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/trainer.py", line 2801, in compute_loss                                                                                                                                                                                      
    outputs = model(**inputs)                                                                                                                                                                                                                                                                   
              ^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                                   
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                                                                                                                                     
    return forward_call(*args, **kwargs)                                                                                                                                                                                                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                        
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward                                                                                                                                                                                  
    output = self._run_ddp_forward(*inputs, **kwargs)                                                                                                                                                                                                                                           
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                           
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward                                                                                                                                                                         
    return module_to_run(*inputs[0], **kwargs[0])  # type: ignore[index]                                                                                                                                                                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                               
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                                                                                                                                     
    return forward_call(*args, **kwargs)                                                                                                                                                                                                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                        
  File "/root/miniconda3/lib/python3.11/site-packages/accelerate/utils/operations.py", line 636, in forward                                                                                                                                                                                     
    return model_forward(*args, **kwargs)                                                                                                                                                                                                                                                       
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                                       
  File "/root/miniconda3/lib/python3.11/site-packages/accelerate/utils/operations.py", line 624, in __call__                                                                                                                                                                                    
    return convert_to_fp32(self.model_forward(*args, **kwargs))                                                                                                                                                                                                                                 
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                                                  
  File "/root/miniconda3/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)                                   
           ^^^^^^^^^^^^^^^^^^^^^                                                                                                                
  File "/workspace/axolotl/src/axolotl/models/llava/llava_mistral.py", line 99, in forward
    outputs = self.model(                                          
              ^^^^^^^^^^^                              
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)               
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 863, in forward
    inputs_embeds = self.embed_tokens(input_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
           ^^^^^^^^^^^^          
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
                                                                        
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
                                                                        
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fc4206ff4d7 in /root/miniconda3/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fc4206c936b in /root/miniconda3/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fc3f633fb58 in /root/miniconda3/lib/python3.11/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7fc38a3eeee0 in /root/miniconda3/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7fc38a3f24b8 in /root/miniconda3/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x227 (0x7fc38a3f3a07 in /root/miniconda3/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xdc253 (0x7fc3f5ab0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #7: <unknown function> + 0x94b43 (0x7fc420f32b43 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: clone + 0x44 (0x7fc420fc3bb4 in /lib/x86_64-linux-gnu/libc.so.6)

winglian · 2023-10-24T03:32:15Z

:1146: block: [17                                                                                                                                                                                                                                                                               
,0,0: indexSelectLargeIndex], thread: [53: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [3` failed.                                                                                                                                                                    
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [54: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [4` failed.                                                                                                           
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [55: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [5` failed.                                                                                                           
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [56: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [6` failed.                                                                                                           
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [57: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [7` failed.                                                                                                           
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [58: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [8` failed.                                                                                                           
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [59: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                               
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [602,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                        
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex` failed.                                                                                                                                                                                                                   
: block: [602,0,0], thread: [11../aten/src/ATen/native/cuda/Indexing.cu,0,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                        
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [60: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [12` failed.                                                                                                          
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [61: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [13` failed.                                                                                                          
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [62: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [14` failed.                                                                                                          
,0../aten/src/ATen/native/cuda/Indexing.cu,0:1146] Assertion `srcIndex < srcSelectDimSize: indexSelectLargeIndex` failed.                                                                                                                                                                       
: block: [17../aten/src/ATen/native/cuda/Indexing.cu,0:1146,0: indexSelectLargeIndex], thread: [63: block: [602,0,0,0,0] Assertion `srcIndex < srcSelectDimSize], thread: [15` failed.                                                                                                          
,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                                                                                                                           
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [602,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                        
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [602,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                        
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [602,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                        
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [602,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                        
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [602,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.                                                                                                                                        
../aten/src/ATen/native/cuda/Indexing.cuterminate called after throwing an instance of 'c10::Error'                                                                                                                                                                                             
:1146: indexSelectLargeIndex: block: [602,0,0  what():  CUDA error: device-side assert triggered                                                                                                                                                                                                
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ehartford · 2023-10-24T03:39:42Z

Gpt4 says

It looks like you're encountering a CUDA error related to indexing in PyTorch. This error is often caused by an invalid index being used to access tensor elements.

Here's a breakdown of the issue:

The error originates from the Indexing.cu file, which is part of the ATen library in PyTorch. This library contains CUDA implementations for tensor operations.
The assertion that's failing is srcIndex < srcSelectDimSize. This suggests that an index (srcIndex) being used to access a tensor is larger than the size of the dimension it's trying to access (srcSelectDimSize).
This error is being triggered multiple times, which could be due to a loop or batch processing.

To troubleshoot and resolve this issue:

Check the Indexing: Ensure that all tensor indexing operations in your code are within valid bounds. For example, if you're trying to access the 10th element of a tensor that only has 9 elements, you'll encounter this error.
Review the Dimensions: Verify the dimensions of tensors you're working with, especially if they're being passed through functions or reshaped. Using the .size() method on a tensor can help you see its dimensions.
Use Device-Side Assertions: The error message suggests compiling with TORCH_USE_CUDA_DSA to enable device-side assertions. This can give more detailed error messages that can help pinpoint the exact location and cause of the problem.
Update PyTorch: Sometimes, issues can be resolved by simply updating to the latest version of PyTorch.
Minimal Reproduction: If you're still stuck, try to create a minimal code example that reproduces the error. This can help you isolate the issue and might make it easier for others to assist you.

Remember, this type of error is almost always related to incorrect indexing. Start by reviewing any indexing operations, slicing, or other tensor manipulations in your code.

ehartford · 2023-10-24T03:40:26Z

Maybe you could try using nightly cuda and pytorch?

winglian · 2023-10-28T06:59:02Z

adding some notes here from troubleshooting:

https://github.com/haotian-liu/LLaVA/blob/e61aa3f88f58f8e871b9c2476d743724e271c776/llava/train/train.py#L701-L707 seems to add an image key to the inputs that end up going to model.forward
because the model expects images (note the plural s), it get's dropped by transformers https://github.com/haotian-liu/LLaVA/blob/66044b727e30f589c6dbf7b58fce021b73566b36/llava/model/language_model/llava_llama.py#L66
additionally, the image token id in the inputs has a value of -200, and I think that might affect the call to the embeddings since there isn't a representation for mapping -200
changing the data_dict key to images seems to have no affect, and the field still gets dropped before the model.forward is called

winglian · 2023-10-28T12:52:55Z

here's the changes to llava that need to be made upstream:

diff --git a/llava/train/train.py b/llava/train/train.py
index cbfcc1b..f418a42 100644
--- a/llava/train/train.py
+++ b/llava/train/train.py
@@ -654,7 +654,7 @@ class LazySupervisedDataset(Dataset):
         length_list = []
         for sample in self.list_data_dict:
             cur_len = sum(len(conv['value'].split()) for conv in sample['conversations'])
-            cur_len = cur_len if 'image' in sample else -cur_len
+            cur_len = cur_len if 'images' in sample else -cur_len
             length_list.append(cur_len)
         return length_list
 
@@ -700,11 +700,11 @@ class LazySupervisedDataset(Dataset):
 
         # image exist in the data
         if 'image' in self.list_data_dict[i]:
-            data_dict['image'] = image
+            data_dict['images'] = image
         elif self.data_args.is_multimodal:
             # image does not exist in the data, but the model is multimodal
             crop_size = self.data_args.image_processor.crop_size
-            data_dict['image'] = torch.zeros(3, crop_size['height'], crop_size['width'])
+            data_dict['images'] = torch.zeros(3, crop_size['height'], crop_size['width'])
         return data_dict
 
 
@@ -732,8 +732,8 @@ class DataCollatorForSupervisedDataset(object):
             attention_mask=input_ids.ne(self.tokenizer.pad_token_id),
         )
 
-        if 'image' in instances[0]:
-            images = [instance['image'] for instance in instances]
+        if 'images' in instances[0]:
+            images = [instance['images'] for instance in instances]
             if all(x is not None and x.shape == images[0].shape for x in images):
                 batch['images'] = torch.stack(images)
             else:

winglian · 2023-10-28T17:39:26Z

Upstream PR here haotian-liu/LLaVA#694

winglian · 2023-10-28T21:45:10Z

git clone https://github.com/OpenAccess-AI-Collective/LLaVA.git
cd LLaVA
git checkout images-name-fix
pip install --no-deps -e .

winglian · 2023-10-28T23:47:16Z

there are definitely optimizations as the LazySupervisedDataset processes all the images on the fly, thus bouncing between the image model and the text model. We could probably preprocess the entire dataset similar to our existing workflows, and also eventually enable sample packing for this https://github.com/haotian-liu/LLaVA/blob/66044b727e30f589c6dbf7b58fce021b73566b36/llava/train/train.py#L660-L707

…train

ritabratamaiti · 2023-11-02T05:14:46Z

Hey, was this the branch used for training openaccess-ai-collective/mistral-7b-llava-1_5-pretrained-projector. If so, when will it be merged into main? Is it recommended to use this branch in the meantime if we want to train multimodal models with axolotl?

ManuelFay · 2024-04-03T16:49:45Z

Any updates on this PR ? @winglian

ZQ-Dev8 · 2024-04-04T18:45:10Z

+1 for llava finetuning with axolotl

winglian added 4 commits October 23, 2023 20:29

WIP llaval support

8667747

handle dataset loading for multimodal

ab9d12c

handle load_model splat

b885169

more fixes to try to get mm working

fdc3e4d

winglian added enhancement New feature or request wip labels Oct 24, 2023

winglian marked this pull request as draft October 24, 2023 03:26

winglian added 2 commits October 24, 2023 09:45

fix code for llava parity, add llama yml

faa46fb

wip

7ff30c4

add docs and tweak yml

1321608

winglian added 3 commits October 29, 2023 05:12

additional args for parity, fix to properly save projector during pre…

ef95ea2

…train

fix to set training args so projector properly saves

53f93f6

pretrain fixes for mm

b52e61a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate LLaVA for multimodal pre-training #781

Integrate LLaVA for multimodal pre-training #781

winglian commented Oct 24, 2023 •

edited

Loading

winglian commented Oct 24, 2023

winglian commented Oct 24, 2023

ehartford commented Oct 24, 2023

ehartford commented Oct 24, 2023

winglian commented Oct 28, 2023 •

edited

Loading

winglian commented Oct 28, 2023

winglian commented Oct 28, 2023

winglian commented Oct 28, 2023

winglian commented Oct 28, 2023

ritabratamaiti commented Nov 2, 2023

ManuelFay commented Apr 3, 2024

ZQ-Dev8 commented Apr 4, 2024

Integrate LLaVA for multimodal pre-training #781

Are you sure you want to change the base?

Integrate LLaVA for multimodal pre-training #781

Conversation

winglian commented Oct 24, 2023 • edited Loading

winglian commented Oct 24, 2023

winglian commented Oct 24, 2023

ehartford commented Oct 24, 2023

ehartford commented Oct 24, 2023

winglian commented Oct 28, 2023 • edited Loading

winglian commented Oct 28, 2023

winglian commented Oct 28, 2023

winglian commented Oct 28, 2023

winglian commented Oct 28, 2023

ritabratamaiti commented Nov 2, 2023

ManuelFay commented Apr 3, 2024

ZQ-Dev8 commented Apr 4, 2024

winglian commented Oct 24, 2023 •

edited

Loading

winglian commented Oct 28, 2023 •

edited

Loading