Added toggle to train after game or after step in DreamerV3 #319

LucaVendruscolo · 2024-09-22T19:54:27Z

Summary

Describe the purpose of the pull request, including:

added a config called "train_after_step" to sheeprl\sheeprl\configs\exp\dreamer_v3.yaml. It is set to true by default which doesn't affect the standard way the program runs but if it's set to false it will make the program wait until the end of the episode/game to train the algorithm with all the data gathered.

Type of Change

Please select the one relevant option below:

New feature (non-breaking change that adds functionality)

Checklist

Please confirm that the following tasks have been completed:

[ yes] I have tested my changes locally and they work as expected. (Please describe the tests you performed.)
[ no] I have added unit tests for my changes, or updated existing tests if necessary.
[ no] I have updated the documentation, if applicable.
[ n/a] I have installed pre-commit and run locally for my code changes.

michele-milesi

Thanks for your PR, I would kindly ask you to:

Fix as described in the other comments.
Add this argument to all the dreamer's algorithms (dreamer V1, V2, V3 and P2E-DV1, P2E-DV2, P2E-DV3), to have them configured in the same way.
Update the documentation:
a. Add here the explanation of the argument you are adding and how it influences the gradient steps.
b. Update the yaml file here, by updating with the new version of the sheeprl/configs/algo/dreamer_v3.yaml file (after moving the argument definition as specified in the comment of sheeprl/configs/exp/dreamer_v3.yaml file).

UPDATE:
another consideration is that when training is distributed, the processes must train all at the same time, which may not happen if the episodes end at different policy steps. Therefore, I would ask you to disable the possibility of training after the episode end with distributed training.

@belerico what do you think?

michele-milesi · 2024-09-29T14:23:27Z

sheeprl/algos/dreamer_v3/dreamer_v3.py

-                        )
-                        cumulative_per_rank_gradient_steps += 1
-                    train_step += world_size
+            if cfg.algo.train_after_game or reset_envs > 0:


Hi @LucaVendruscolo, I would change the name of the argument to train_on_episode_end.
By default its value is set to False and it maintains the standard behaviour.
When set to True, it will start training only if the episode is finished.
So, you should change the control to: if (cfg.algo.train_on_episode_end and reset_envs > 0) or not cfg.algo.train_on_episode_end:.

I prefer to have a configuration with a clear argument name.

michele-milesi · 2024-09-29T14:24:22Z

sheeprl/configs/exp/dreamer_v3.yaml

@@ -8,6 +8,7 @@ defaults:

 # Algorithm
 algo:
+  train_after_step: true


Here you set an argument with a different name than the other file, I ask you to change its name and default value according to the previous comment and move the definition of this parameter in the ./sheeprl/configs/algo/dreamer_v3.yaml file.
Please add also a brief comment where you indicate the meaning of the parameter.

Added toggle to train after game or after step in DreamerV3

380b880

LucaVendruscolo requested review from belerico, DavideTr8, michele-milesi and rcmalli as code owners September 22, 2024 19:54

michele-milesi requested changes Sep 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added toggle to train after game or after step in DreamerV3 #319

Added toggle to train after game or after step in DreamerV3 #319

LucaVendruscolo commented Sep 22, 2024

michele-milesi left a comment •

edited

Loading

michele-milesi Sep 29, 2024

michele-milesi Sep 29, 2024

Added toggle to train after game or after step in DreamerV3 #319

Are you sure you want to change the base?

Added toggle to train after game or after step in DreamerV3 #319

Conversation

LucaVendruscolo commented Sep 22, 2024

Summary

Type of Change

Checklist

michele-milesi left a comment • edited Loading

Choose a reason for hiding this comment

michele-milesi Sep 29, 2024

Choose a reason for hiding this comment

michele-milesi Sep 29, 2024

Choose a reason for hiding this comment

michele-milesi left a comment •

edited

Loading