This repository has been archived by the owner on Nov 1, 2024. It is now read-only.
I change Num_head of OPT-1.3b,and it cause CUDA Error: IndexSelectLargeIndex, #751
Labels
bug
Something isn't working
🐛 Bug
To Reproduce
My data process is fine ,but when i come to train the data, it broken during some steps.I used colossalai to train this ,an i only change the NUM_Head in model-config.json
Code sample
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [24,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [25,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [26,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [27,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [28,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [29,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [30,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [160,0,0], thread: [31,0,0] Assertion
srcIndex < srcSelectDimSize
failed.Expected behavior
Environment
pip
, source):Additional context
The text was updated successfully, but these errors were encountered: