Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StopIteration: Caught StopIteration in replica 0 on device 0. #72

Open
octopusStar218 opened this issue May 23, 2024 · 0 comments
Open

Comments

@octopusStar218
Copy link

We ran into the same problem as issue#17, but we still got an error even though we had to comment out "replace_llama_attn_with_flash_attn()"

0%|                                                | 0/274797 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/accelerate/accelerator.py", line 1058, in accumulate
    yield
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 108, in parallel_apply
    output.reraise()
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/_utils.py", line 705, in reraise
    raise exception
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
    output = module(*input, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/graph_learning/GraphGPT-main/graphgpt/model/GraphLlama.py", line 325, in forward
    outputs = self.model(
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/graph_learning/GraphGPT-main/graphgpt/model/GraphLlama.py", line 202, in forward
    node_forward_out = graph_tower(g)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/graph_learning/GraphGPT-main/graphgpt/model/graph_layers/graph_transformer.py", line 64, in forward
    device = self.parameters().__next__().device
StopIteration

  0%|          | 0/274797 [01:57<?, ?it/s]          
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant