Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gecco-Torch fails on "Sanity Checking Dataloader" #1

Open
grgkopanas opened this issue Oct 26, 2023 · 5 comments
Open

Gecco-Torch fails on "Sanity Checking Dataloader" #1

grgkopanas opened this issue Oct 26, 2023 · 5 comments

Comments

@grgkopanas
Copy link

grgkopanas commented Oct 26, 2023

Hi,

Trying to run the torch version I get the following error. Have you ever seen that before? It seems like it's not even on the code-base but rather some pytorch-lightning internals.

I installed by creating a conda environment with python >3.10 and then pip install -e ./

I have changed nothing in the code.

$ python shapenet_airplane_unconditional.py

Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name        | Type            | Params
------------------------------------------------
0 | backbone    | EDMPrecond      | 13.5 M
1 | conditioner | IdleConditioner | 0
2 | loss        | EDMLoss         | 0
3 | reparam     | GaussianReparam | 0
------------------------------------------------
13.5 M    Trainable params
0         Non-trainable params
13.5 M    Total params
53.924    Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%|                                                                                                                                                                                                                               | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/graphdeco/user/gkopanas/point_diffusion/gecco/gecco-torch/example_configs/shapenet_airplane_unconditional.py", line 82, in <module>
    trainer().fit(model, data)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _run
    results = self._run_stage()
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1034, in _run_stage
    self._run_sanity_check()
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1063, in _run_sanity_check
    val_loop.run()
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/loops/utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 403, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 586, in _compile
    raise InternalTorchDynamoError(str(e)).with_traceback(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 549, in compile_inner
    check_fn = CheckFunctionManager(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 929, in __init__
    guard.create(local_builder, global_builder)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_guards.py", line 243, in create
    return self.create_fn(self.source.select(local_builder, global_builder), self)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 404, in CONSTANT_MATCH
    val = self.get(guard.name)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 234, in get
    return eval(name, self.scope, CLOSURE_VARS)
  File "<string>", line 1, in <module>
torch._dynamo.exc.InternalTorchDynamoError: 'NoneType' object is not subscriptable


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
@grgkopanas
Copy link
Author

I tried it in a completely different setup, different cluster/hardware and got the same error.

@jatentaki
Copy link
Collaborator

I see, can you provide the output of pip freeze (with this env activated)? It could be that you're on a differnet version of pytorch and it doesn't work for some reason. I did my development on 2.0.1. I'll dump my pip freeze on Monday so we can look for differences.

@grgkopanas
Copy link
Author

Thank you for suggesting this, torch 2.0.1 helped but it also needed to pip install tensorboard - should be added in the env setup.

@jatentaki
Copy link
Collaborator

Does this make the code work overall? I'll try to fix this is an update rather than pinning old dependencies

@grgkopanas
Copy link
Author

As far as I can tell the code works fine with 2.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants