Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG] - torch.cuda.OutOfMemoryError #26

Open
renige opened this issue Apr 17, 2024 · 0 comments
Open

🐛 [BUG] - torch.cuda.OutOfMemoryError #26

renige opened this issue Apr 17, 2024 · 0 comments

Comments

@renige
Copy link

renige commented Apr 17, 2024

Environment Settings

ubuntu 22.04
cuda 11.8
conda 24.3.0
python 3.10.12

Expected Behavior

...

Actual Behavior

When you run the code below, it says there is no GPU memory, but if you check the GPU memory, you can see that it is sufficient.

evaluator.run(model=model, benchmark=benchmark)

2024-04-15 11:54:53.994857: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-15 11:54:55.104254: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-15:11:54:59,094 INFO [main.py:251] Verbosity set to INFO
2024-04-15:11:55:03,977 INFO [main.py:335] Selected Tasks: ['arc_challenge']
2024-04-15:11:55:03,977 INFO [main.py:336] Loading selected tasks...
2024-04-15:11:55:03,978 INFO [evaluator.py:131] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234
2024-04-15:11:55:03,982 INFO [huggingface.py:162] Using device 'cuda'
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:06, 2.12s/it]
Traceback (most recent call last):
File "/home/llm/anaconda3/envs/llm/bin/lm_eval", line 8, in
sys.exit(cli_evaluate())
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/lm_eval/main.py", line 342, in cli_evaluate
results = evaluator.simple_evaluate(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/lm_eval/evaluator.py", line 164, in simple_evaluate
lm = lm_eval.api.registry.get_model(model).create_from_arg_string(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/lm_eval/api/model.py", line 133, in create_from_arg_string
return cls(**args, **args2)
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 201, in init
self._create_model(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 524, in _create_model
self._model = self.AUTO_MODEL_CLASS.from_pretrained(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3850, in from_pretrained
) = cls._load_pretrained_model(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4284, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/llm/anaconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 387, in set_module_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacity of 10.75 GiB of which 49.94 MiB is free. Including non-PyTorch memory, this process has 10.68 GiB memory in use. Of the allocated memory 10.51 GiB is allocated by PyTorch, and 19.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[2024-04-15 11:55:10][INFO][evalverse - evaluator.py:183] >> arc_challenge_25shot done! exec_time: 0 min for upstage/SOLAR-10.7B-Instruct-v1.0


+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:1D:00.0 Off | N/A |
| 32% 34C P8 16W / 250W | 19MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:1E:00.0 Off | N/A |
| 30% 33C P8 1W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 2080 Ti Off | 00000000:1F:00.0 Off | N/A |
| 31% 34C P8 3W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 2080 Ti Off | 00000000:20:00.0 Off | N/A |
| 30% 33C P8 21W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA GeForce RTX 2080 Ti Off | 00000000:21:00.0 Off | N/A |
| 31% 35C P8 1W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA GeForce RTX 2080 Ti Off | 00000000:22:00.0 Off | N/A |
| 31% 34C P8 20W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA GeForce RTX 2080 Ti Off | 00000000:23:00.0 Off | N/A |
| 31% 34C P8 2W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA GeForce RTX 2080 Ti Off | 00000000:24:00.0 Off | N/A |
| 30% 36C P8 4W / 250W | 9MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2300 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 2865 G /usr/bin/gnome-shell 4MiB |
| 1 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
| 4 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
| 5 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
| 6 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
| 7 N/A N/A 2300 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+

Reproduction

  1. evalverse/examples/01_basic_usage.ipynb

  2. Run the code below
    evaluator = ev.Evaluator()

model = "upstage/SOLAR-10.7B-Instruct-v1.0"

model = "meta-llama/Llama-2-7b"

benchmark = "h6_en"
evaluator.run(model=model, benchmark=benchmark)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant