Issue on page /_modules/ray/tune/integration/pytorch_lightning.html #33708

actuallyaswin · 2023-03-25T03:02:53Z

I hit an issue when using TuneReportCallback in the context of PyTorch Lightning.

2023-03-24 22:49:09,709 INFO worker.py:1553 -- Started a local Ray instance.                                                                       
== Status ==                                                                                                                                       
Current time: 2023-03-24 22:49:23 (running for 00:00:12.52)
Memory usage on this node: 38.1/251.8 GiB 
Using FIFO scheduling algorithm.
Resources requested: 1.0/32 CPUs, 0.1/2 GPUs, 0.0/109.27 GiB heap, 0.0/50.82 GiB objects (0.0/1.0 accelerator_type:A40)
Result logdir: ...
Number of trials: 1/1 (1 RUNNING)
+------------------------+----------+------------------------+
| Trial name             | status   | loc                    |
|------------------------+----------+------------------------|
| experiment_9efa3_00000 | RUNNING  | 129.79.247.102:3416524 |
+------------------------+----------+------------------------+


(experiment pid=3416524) Global seed set to 0
2023-03-24 22:49:24,560 ERROR trial_runner.py:1062 -- Trial experiment_9efa3_00000: Error processing event.
ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=3416524, ip=129.79.247.102, repr=experiment)
  File "/home/asivara/venv/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 368, in train
    raise skipped from exception_cause(skipped)
  File "/home/asivara/venv/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 337, in entrypoint
    return self._trainable_func(
  File "/home/asivara/venv/lib/python3.8/site-packages/ray/tune/trainable/function_trainable.py", line 654, in _trainable_func
    output = fn()
  File "train.py", line 280, in experiment
    trainer = Trainer(
  File "/home/asivara/venv/lib/python3.8/site-packages/lightning/pytorch/utilities/argparse.py", line 69, in insert_env_defaults
    return fn(self, **kwargs)
  File "/home/asivara/venv/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 421, in __init__
    self._callback_connector.on_trainer_init(
  File "/home/asivara/venv/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/callback_connector.py", line 79, in on_trainer_init
    _validate_callbacks_list(self.trainer.callbacks)
  File "/home/asivara/venv/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/callback_connector.py", line 252, in _validate_callback
s_list
    stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
  File "/home/asivara/venv/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/callback_connector.py", line 252, in <listcomp>
    stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
  File "/home/asivara/venv/lib/python3.8/site-packages/lightning/pytorch/utilities/model_helpers.py", line 34, in is_overridden
    raise ValueError("Expected a parent")
ValueError: Expected a parent

Based on this PyTorch thread, I think the imports on ray.tune.integration.pytorch_lightning need to be updated to Lightning 2.

The text was updated successfully, but these errors were encountered:

beasteers · 2023-03-25T18:49:42Z

(I also just hit this issue 😭)

stale · 2023-08-12T05:22:16Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale · 2023-10-15T14:13:51Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

championsnet · 2024-04-11T15:03:25Z

Based on this PyTorch thread, I think the imports on ray.tune.integration.pytorch_lightning need to be updated to Lightning 2.

According to this thread in the Lightning repo, this error appears when you include callbacks that have pytorch_lightning imports.

So the fix is to fork the current implementation and update the import yourself I guess.

UPDATE: Should be fixed in #44339

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Aug 12, 2023

stale bot closed this as completed Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on page /_modules/ray/tune/integration/pytorch_lightning.html #33708

Issue on page /_modules/ray/tune/integration/pytorch_lightning.html #33708

actuallyaswin commented Mar 25, 2023

beasteers commented Mar 25, 2023

stale bot commented Aug 12, 2023

stale bot commented Oct 15, 2023

championsnet commented Apr 11, 2024 •

edited

Loading

Issue on page /_modules/ray/tune/integration/pytorch_lightning.html #33708

Issue on page /_modules/ray/tune/integration/pytorch_lightning.html #33708

Comments

actuallyaswin commented Mar 25, 2023

beasteers commented Mar 25, 2023

stale bot commented Aug 12, 2023

stale bot commented Oct 15, 2023

championsnet commented Apr 11, 2024 • edited Loading

championsnet commented Apr 11, 2024 •

edited

Loading